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We present the determination of a set of parton distributions of the nucleon, at next- 
to-leading order, from a global set of deep-inelastic scattering data: NNPDFl.O. The de- 
termination is based on a Monte Carlo approach, with neural networks used as unbiased 
5^ \ interpolants. This method, previously discussed by us and applied to a determination of 

the nonsinglet quark distribution, is designed to provide a faithful and statistically sound 
representation of the uncertainty on parton distributions. We discuss our dataset, its sta- 
tistical features, and its Monte Carlo representation. We summarize the technique used 
to solve the evolution equations and its benchmarking, and the method used to compute 
physical observables. We discuss the parametrization and fitting of neural networks, and 
the algorithm used to determine the optimal fit. We finally present our set of parton 
distributions. We discuss its statistical properties, test for its stability upon various mod- 
ifications of the fitting procedure, and compare it to other recent parton sets. We use it to 
compute the benchmark W and Z cross sections at the LHC. We discuss issues of delivery 
and interfacing to commonly used packages such as LHAPDF. 
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1 Introduction 



1.1 Determination of parton distributions 

The determination of parton distributions has gone through various phases which mirror 
the evolution of theoretical and phenomenological understanding of the theory of strong 
interactions. At a very early stage [1-6], parton distributions were determined through 
a combination of general physical principles (as embodied in sum rules), model assump- 
tions and the first crude experimental information coming from Bjorken scaling and its 
violation. These determinations were semi-quantitative at best, and they were aimed at 
showing the compatibility of the data with the partonic interpretation of hard processes. 
The parton sets were used to compare the observed scaling violations with those predicted 
by perturbative QCD [2-6], thereby leading to first tests of the theory of strong interac- 
tions. These early investigations met with such success that the parton set of Buras and 
Gaemers [7] is sometimes still used today [8]. 

As the accuracy of the data and the confidence in perturbative QCD improved, the 
gluon distribution was extracted from scaling violations [9] , and first parton sets based on 
consistent global fits were performed [10, 11]. Despite the availability of next-to- leading 
order evolution tools [12], these analyses were performed at leading order, which was 
accurate enough for these sets to be widely used for phenomenology in the ensuing decade. 

However, thanks to a second generation of high-precision deep-inelastic scattering [13] 
and hadron collider [14] experiments, QCD gradually evolved towards being viewed as 
precision physics — an integral part of the standard model. This required an approach to 
parton determination based on next-to-leading order theory (in order to have perturbative 
uncertainties under control) , and also based on fairly wide "global" sets of data of a varied 
nature, in order to minimize as much as possible the role of theoretical prejudice in the 
determination of the shape of the parton distributions at the initial scale [15-18]. 

Next-to-lcading order parton sets evolved into standard analysis tools and were con- 
stantly updated throughout the ensuing decade [19]. In particular, the wealth of data 
from the HERA collider [20] led to a considerable increase in the size of the kinematic 
region over which parton distributions could be determined, along with a substantial im- 
provement in accuracy, especially in the determination of quantities which are sensitive to 
scaling violations. Accumulated knowledge eventually led to parton sets (such as as the 
CTEQ5 [21] and MRST2001 [22] sets) very likely to have an accuracy comparable to that 
of next-to-leading order QCD computation, adequate for the determination of most hard 
processes at collider energies. These parton sets differ in many technical details, but are 
rooted in a similar approach: a parton paramctrization is assumed, based on the functional 
form f{x) ~ a;"(l — x)^ (used since the earliest investigations [1]), and its parameters are 
then tuned so that the various computed observables fit the experimental data. 

With parton distributions now a tool for precision physics, it becomes important to be 
able to assess accurately the uncertainty on any given parton set. This need was recognized 
at a relatively early stage, and in fact the parton set of Ref. [16] included error parton sets 
along with average ones. However, providing error estimates which can be relied upon 
raises many subtle issues, the most obvious of which is the need for a full treatment of 
correlated uncertainties of the underlying data. In the absence of a full understanding of 
the problem, the only way of estimating the uncertainty related to the parton distribution 
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was to compare results obtained with several parton sets, an especially unsatisfactory- 
procedure given that many possible sources of systematic bias are likely to be common to 
several parton determinations. 

Some first determinations of parton distributions with uncertainties were obtained by 
only fitting to restricted data sets (typically from a subset of deep-inelastic experiments), 
but retaining all the information on the correlated uncertainties in the underlying data, 
and propagating it through the fitting procedure [23-25]. The need for a systematic 
approach which could lead to parton distributions with reliable uncertainty estimation 
was stressed in the seminal papers Ref. [26,27], where an entirely different approach to 
parton determination was suggested, based on Bayesian inference combined with a Monte 
Carlo approach. While the approach of Ref. [27] was never fully implemented, the need 
for parton sets with uncertainties is now generally recognized, and there are currently 
at least three sets of parton distributions with uncertainties available, maintained by the 
CTEQ [28-32], MRST-MSTW [33-36] and Alekhin [37-40] groups. 

Parton distributions with uncertainties [28, 33, 40] have now become standard. Nev- 
ertheless, many of the problems raised in Refs. [26, 27] are still only partly solved. In 
particular, benchmark comparisons performed between some of these sets [41] have shown 
that the uncertainties that come with them are not easily interpreted in a statistical 
sense, in that they are to a significant amount determined or constrained by theoretical or 
phenomenological expectations. Indeed, whereas uncertainty bands for parton determina- 
tions based on restricted data sets [40] are obtained by using standard error propagation 
of one-sigma contours, those for global fits which include a large variety of data [28, 33] 
are obtained on the basis of a tolerance [28] , determined by studying the compatibility of 
the data with each other and with the underlying theory. The effect of this tolerance is 
equivalent to multiplying experimental errors by a factor between four and six. 

This state of affairs might be the inevitable consequences of incompatibilities between 
data and, possibly, of inadequacy of the theory used to describe them. Be that as it may, 
the standard parton determination method based on fitting a particular functional form 
does not seem to be sufficiently flexible to ascertain whether this is the case: in the absence 
of a term of comparison, it is hard to tell to which extent the current difficulties are due 
to an intrinsic limitation of the methodology. 

An altogether new approach was proposed in Ref. [42]. The general aim of this ap- 
proach is to determine objectively both the value and the uncertainty of a function (or 
set of functions) from a discrete set of many independent (and possibly incompatible) ex- 
perimental measurements. Its viability was originally demonstrated by using it to provide 
a determination of the structure function F2{x,Q^) of the proton and neutron from its 
direct measurement at around 600 points, each by two independent experiments [42]. The 
method was then used in Ref. [43] to provide a state-of-the art determination of the same 
structure function for the proton, by combining almost 2000 different measurements in 13 
different data sets, thereby addressing issues of data incompatibility. Finally, in Ref. [44] 
it was used to provide the determination of a single parton distribution (the nonsinglet 
quark distribution), thereby addressing the issue of determining a quantity which is not 
measured directly, but rather related through theory to an experimental observable. In 
the present paper, we use this method for the construction of a first parton set from deep- 
inelastic data: we determine five parton distributions from around 3000 measurements in 
25 different data sets. 
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Figure 1: Schematic representation of the NNPDF approach. 



1.2 The NNPDF approach 

The approach adopted here for the determination of parton distributions is based on a 
combination of a Monte Carfo method with the use of neural networks as basic interpo- 
lating functions. The general idea is twofold: first, problems related to the possibility of 
non-gaussian errors and nontrivial error propagation are best addressed through the use 
of a representation whereby central values are obtained from a Monte Carlo sample as av- 
erages, uncertainties as standard deviations, and so forth. Second, problems which require 
the reconstruction of a function through its discrete sampling, without making assumptions 
on its functional form, are best addressed using neural networks as unbiased interpolants. 
The combination of these two techniques works well in situations where data are partly 
inconsistent, in that neural networks are well suited to the separation of a smooth signal 
from background fluctuations, while the Monte Carlo handles the fluctuations themselves. 

The strategy is summarized in Figured! and it involves two stages. In the first stage, 
one generates a Monte Carlo ensemble of replicas of the original data. This ensemble is 
generated with the probability distribution of the data, and it is large enough that the 
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statistical properties of the data are reproduced to the desired accuracy. In practice, most 
data are given with multigaussian probabihty distributions of statistical and systematic 
errors, described by a covariance matrix and a normalization error, and in such cases 
this is the distribution that will be used to generate the pseudodata. However, any other 
probability distribution can be used if and when required by the experimental data. Each 
element in the Monte Carlo set is a replica of the experimental data: each replica contains 
as many data points as are originally available. The ensemble contains all the available 
experimental information, which can be reproduced by performing statistical operations 
on the replicas which form the ensemble. That indeed the given ensemble has the desired 
statistical features can be verified by means of standard tests, such as comparison of 
quantities calculated from it with the original properties of the data: this is denoted in 
Fig. [1] as "tests exp-art" , namely, the comparison of experimental and artificial data. 

In the second stage, a set of parton distributions is constructed from each replica of 
the data. Each parton distribution function (PDF) at a given scale is parametrized by an 
individual neural network: the neural network is just an especially convenient functional 
form of parton parametrization, used in place of the usual functional forms. Physical 
observables are computed from parton distributions in the usual way. One first chooses a 
basis set of initial parton distributions, typically smaller than the maximal set of twelve 
quarks and antiquarks plus one gluon — here we use a set of five independent parton dis- 
tributions. One then evolves from the initial scale to the scale at which data are available 
by using standard QCD evolution equations, and physical observables are computed by 
convoluting the evolved parton distributions with hard partonic cross sections. The best 
fit set of parton distribution is finally determined by comparing the theoretical compu- 
tation of the observable for a given PDF set with their replica experimental values. The 
experimental values will of course be different in each replica — they will fluctuate ac- 
cording to their distribution in the Monte Carlo ensemble — and the best fit PDFs will be 
accordingly different for each replica. The ensemble of these best fit PDFs, which contains 
as many elements as the set of replicas of the data that were generated, is the final result 
of the parton determination. 

The way in which the best fit set of PDFs is determined from each data replica is 
especially important. A first obvious requirement is that the best fit be independent of any 
assumptions made about the parton parametrization. This requirement is met by adopting 
a redundant parametrization: the size of the neural networks used, i.e. the number of 
parameters used to parametrize them, is much larger than the minimum required in order 
to reproduce the data. This redundancy may be checked a posteriori, by verifying that 
results are independent of the size and architecture of the neural network. 

A more subtle issue is that of establishing how the best fit is to be determined. A 
first possible answer might be to determine the best fit as the absolute minimum of the 

(i.e. absolute maximum of the likelihood) of the comparison between theory and data 
for a given replica. As already pointed out in Ref. [42], however, this procedure does not 
produce the optimal fit for quantities with some built-in smoothness, such as physical 
cross sections. Indeed, even for fully compatible data, independent measurements of the 
same quantity at the same point will fluctuate within the uncertainty of the measurement. 
If fltted by maximum likelihood, such independent measurements will automatically be 
combined into their weighted average [45]. However, assume now that two independent 
measurements are performed of the same observable, but measured at very close values 
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of the underlying kinematic variables: for example the structm'e function F[x,Q'^) at 
the same and two different but close values x. Then a fit which goes through the 
central values of both measurements might be possible, but in the limit in which the two 
measurements are performed at infinitesimally close points this would correspond to a 
discontinuous behaviour of the observable, which is surely unphysical. This problem is 
exacerbated in the case of incompatible measurements. 

It was thus suggested already in Ref. [42] that the best fit should be characterized by 
a value of the which is not as small as possible, but rather equal to the value expected 
on the basis of the fluctuations of the data. In order to determine this value, a strategy 
was developed in Ref. [44], based on the cross-validation method used quite generally in 
neural network studies [46]. Namely, for each replica, the data are divided randomly into 
a training set and a validation set. The fit is then performed on the data in the training 
set, and the computed from data in both sets is monitored. Minimization is stopped 
when the iii the validation set (not used for fitting) stops decreasing. The method is 
made possible by the availability of a very large and mostly compatible set of data, and it 
guarantees that the best fit does not attempt to reproduce random fluctuations of the data. 
The method also handles incompatible data, by automatically tolerating fluctuations in 
the data even when they are larger than the nominal uncertainty, whenever fitting these 
fluctuations would not lead to an improvement of the global quality of the fit. This 
analysis is denoted as "tests net-art" in Fig. [H namely, the comparison of neural net to 
the previously generated artificial data. 

An important feature of this approach is that many issues of parton determination can 
be now addressed using standard statistical tools. For example, the stability of results 
upon a change of parametrization can be verified by computing the distance between 
results in units of their standard deviation. An advantage of the Monte Carlo approach is 
that it is no more difficult to do this for uncertainties, correlation coefficients or even more 
indirect quantities, than it is for values of physical observables. Likewise, it is possible 
to verify that fits performed by removing data from the set have wider error bands but 
remain compatible within these enlarged uncertainties, and so forth. The reliability of 
the results, in particular for the uncertainties on the PDFs, can thus be assessed directly. 
This assessment is denoted as "tests net-net" in Fig. [H 

Of course, it is also possible to perform the standard tests based on the comparison of 
the final fit prediction with the original input data set. The most fundamental of these is 
the comparison to the data (and the computation of the corresponding x^) for the best 
fit obtained by averaging results over all neural nets in the final sample. These tests are 
denoted as "tests net-exp" in Fig. [TJ the comparison of the final set of neural nest to the 
starting experimental data set. 

The implementation of this approach for the parton fit presented here follows the 
principles and techniques presented in Ref. [44] . In this paper we present the adaptations 
which are needed in order to go from the determination of a single parton distribution to 
that of a full set. We also present in detail the features and results of this specific PDF 
determination, NNPDFl.O, and in particular the results of the tests discussed above. 

The paper is organized as follows. In Section [2] we present the data which will be 
used in our determination, along with their main statistical features, and the kinematic 
cuts that we have applied to them, and the results of tests to verify that the Monte 
Carlo sample provides a faithful representation of these data. In Section [3] we summarize 
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the method we use to solve evolution equations (already introduced in Ref. [44] in the 
nonsinglet case), provide results from its benchmarking, describe the specific basis of PDFs 
we use, and discuss the computation of physical observables with the inclusion target-mass 
corrections. Full details of the hard kernels used to construct the physical observables are 
given in Appendix |X1 In Section H] we turn to the neural network parametrization and 
minimization: we summarize the structure of neural networks used, the weighted genetic 
algorithm which has been employed to train them, and the stopping algorithm used to 
determine the best fit along the lines discussed above (already introduced in Ref. [44]) and 
present the parameter settings and specific features used in the current fits. In Section [5] 
we discuss in detail our results: we discuss the general statistical features of our reference 
parton set, and present the individual PDFs and their correlations; we discuss the results 
of various stability tests related to the architecture of neural network (size and choice 
of the preprocessing function) and the dataset (kinematic cuts and reduced datasets); 
we study theoretical uncertainties, specifically higher order corrections and the choice 
of value of the strong coupling; finally we present results computed using our dataset 
both for some of the physical observables entering the fit (such as structure functions and 
reduced cross sections) and for benchmark LHC observables (the W and Z total cross 
sections). Useful formulae for the determination of uncertainties on physical observables 
using various different PDF sets are summarized in Appendix [Bj An outlook on future 
developments is provided in Section [H 
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2 Experimental data and Monte Carlo generation 



The determination of PDFs presented in this work is based on a comprehensive set of 
experimental data from deep-inelastic scattering (DIS) with various lepton beams and 
nucleon targets. We choose a purely DIS data set for this first fit because of the known 
general consistency of DIS data. We include both proton data and neutron data from 
nuclear targets in order to be able to disentangle isospin triplet and isospin singlet con- 
tribution. We also include charged current scattering data from charged lepton beams 
and neutrino scattering data in order to be able to disentangle the quark and antiquark 
distributions. Because of the limitations due to only fitting DIS data, we will take a basis 
of five independent parton distributions, namely the two light flavours and antiflavours 
and the gluon, as we discuss below in Sect. 13.61 Further constraints on PDFs could be ob- 
tained from various other experiments, in particular from hadron-hadron scattering. The 
inclusion of these data in our fit is conceptually straightforward, and is left to forthcoming 
publications. We shall first present the general features of the data we use, then discuss 
the construction of the covariance matrix, provide definitions of relevant observables and 
finally present the generation and testing of the Monte Carlo sample of pseudodata. 

2.1 The data set 

The data sets used in this study are listed in Table [H and their kinematic coverage is 
shown in Fig. [2j They can be summarized as follows. 

We use the data for proton and deuteron structure functions '"^ determined in fixed- 
target experiments by the BCDMS [47,48] and NMC [49,50] collaborations, which were 
already included in our previous analysis of the nonsinglet quark distribution in Ref . [44] . 
They provide the most accurate and up-to-date information on the valence region of parton 
distributions. They are supplemented with data on the structure functions from SLAC [51] 
which, though rather older and less precise, improve the kinematic coverage in the large x 
region. Compared to previous studies by our collaboration, we now use the ratio F2/F2 
whenever data for this observable are available, thereby benefitting from cancellations in 
the correlated systematic uncertainties. Altogether these data cover the middle- to large-x 
and smaller region of the kinematical range, corresponding to the lower-right corner 
in Fig. [21 

Collider experiments have explored a larger kinematical range in great detail. Neutral 
and charged current reduced cross sections from the HI [52-55] and ZEUS [56-61] collab- 
orations are used in the current fit. As shown in Fig. [2] these data sets yield informations 
in a much wider region of the (x,Q^) plane, in both the small-x and the large-Q^ direc- 
tions. We also include the data for Fl that have recently appeared in Ref. [62]. This is 
a rather small data set, but it provides the only direct measurement of F^. We refer to 
Refs. [42-44] for additional informations on all the data sets that were used in our earlier 
studies. 

In order to be able to control the valence-sea (or quark-antiquark) separation, in this fit 
we also include neutrino DIS data. Specifically, we use the large, up-to-date, and consistent 
set of neutrino and antineutrino scattering data by the CHORUS collaboration [63]. These 
data have a similar kinematic coverage to the fixed target charged lepton DIS data. 

The main features of our data sets are summarized in Table [H where we show the 
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Experiment 


Sot 


JVdat 


^miii 








<Ttot (%) 


F 


Rcf. 


SLAC 


SLACp 
SLACd 


211 (47) 
211 (47) 


.07000 
.07000 


.85000 
.85000 


0.6 
0.6 


29. 
29. 


3.6 
3.2 


pP 
% 


f511 
[51] 


BCDMS 


BCDMSp 
BCDMSd 


351 (333) 
254 (248) 


.07000 
.07000 


.75000 
.75000 


7.5 
8.8 


230. 
230. 


5.5 
6.6 


^2 
F.'f 


[47] 
[48] 


NMC 


288 (245) 


.00350 


.47450 


0.8 


61. 


5.0 





[50] 


NMC-pd 




.00150 


.67500 


0.2 


99. 


2.1 


/ FP 




ZEUS 
















-NC..+ 






Z971owQ2 


80 


.00006 


.03200 


2.7 


27. 


4.9 


[56] 




Z97NC 


160 


.00080 


.65000 


35.0 


20000. 


7.7 


-iVC.e + 


[56] 




Z97CC 


29 


.01500 


.42000 


280.0 


17000. 


34.2 


jCC,e + 


[57] 




Z02NC 


92 


.00500 


.65000 


200.0 


30000. 


13.2 


-JVC.e- 


[58] 




Z02CC 


26 


.01500 


.42000 


280.0 


30000. 


40.2 




[59] 




Z03NC 


90 


.00500 


.65000 


200.0 


30000. 


9.1 


-iVC. = + 


[60] 




Z03CC 


30 


.00800 


.42000 


280.0 


17000. 


31.0 


5CC,e + 


[61] 


HI 






















H197mb 


67 (55) 


.00003 


.02000 


1.5 


12. 


4.9 


-JVC,e+ 


[52] 




H1971owQ2 


80 


.00016 


.20000 


12.0 


150. 


4.2 


-iVC. = + 


[52] 




H197NC 


130 


.00320 


.65000 


150.0 


30000. 


13.3 


-iVC.e + 


[53] 




H197CC 


25 


.01300 


.40000 


300.0 


15000. 


29.8 


5CC.e + 


[53] 




H199NC 


126 


.00320 


.65000 


150.0 


30000. 


15.5 


jNC.e" 


[54] 




H199CC 


28 


.01300 


.40000 


300.0 


15000. 


27.6 




[54] 




H199NChy 


13 


.00130 


.01050 


100.0 


800. 


9.2 


-NC.e- 


[55] 




HIOONC 


147 


.00131 


.65000 


100.0 


30000. 


10.4 


-iVC,e + 


[55] 




HIOOCC 


28 


.01300 


.40000 


300.0 


15000. 


21.8 


-CC..+ 


[55] 


CHORUS 


CHORUSi^ 
CHORUSi^ 


607 (471) 
607 (471) 


.02000 
.02000 


.65000 
.65000 


0.3 
0.3 


95. 
95. 


11.2 
18.7 




[63] 
[63] 


FLH108 


8 


.00028 


.00360 


12.0 


90. 


69.2 


Fl 


|62| 


Total 


3948 (3161) 





Table 1: The experiments included in the present analysis divided in the respective data sets. We 
show the number of points before (after) applying kinematic cuts, the kinematic range, the average 
total uncertainty and the measured observable. Different sets within an experiment are correlated 
with each other, but data from different experiments are not. 

beam, target and observable, the number of data points, the kinematic range, the size 
of uncertainties averaged over the data points. The observable chosen is generally that 
which is closest to the experimental measurement and minimizes the pre-analysis by the 
experimental collaboration: in particular we have used the reduced cross section for all 
collider and neutrino data sets. The various systematics and their correlations are treated 
according to the information provided by the experimental collaborations themselves (see 
Section 2 in Ref. [42] for a detailed description of NMC and BCDMS data. Table 1 in 
Ref. [64] for ZEUS data. Table 2 in Ref. [55] for HI data,Ref. [65] for CHORUS data). 

In Table [T] we distinguish between "Experiments" , defined as groups of data that 
are not correlated to each other, and "Sets" within an experiment, which are correlated 
with each other. They correspond to measurements of different observables in the same 
experiment, or measurements of the same observables in different years which retain some 
correlated systematics. This distinction will be important in the minimization strategy, 
discussed in Section 14.21 below. 
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2.2 Uncertainties and correlations 

The covariance matrix for each experiment can be computed from knowledge of statistical, 
systematic and normalization uncertainties: 

/ Nc Na Nr \ 

COVpq = '^CTp^iaq^l + ^ CFp^n<^q,n + ^ O-p^nCTq^n + ^pq^p,s Pp,lPq,J ) (1) 
\l=l n=l n=l J 

where p and q run over the experimental points, Fj^p = Fj{xp, Qp) and Fj^g = Fj{xq, Qq) 
are the measured central values for the observables / and J, and the various uncertainties, 
given as relative values, are: ap^i, the Nc correlated systematic uncertainties; crp,n, the Na 
(Nr) absolute (relative) normalization uncertainties; ap^s the statistical uncertainty. 
The correlation matrix is defined as 

p„^ ^ , , (2) 

where the total uncertainty cip^tot for the p-th point is given by 



Cp,tot = Y CTp^s + (^p,c + ^p,N ' (^) 

the total correlated uncertainty fip.c is the sum of all correlated systematics 

Nc 

^lc = ^^li, (4) 
1=1 

and the total normalization uncertainty is 

Na Nr 

71=1 n=l 

The factor of one half in the relative normalization uncertainties comes from the first order 
expansion of Eq. (I14p below. 

The Nu uncorrelated systematic uncertainties quoted for HERA data sets are combined 
with the statistical uncertainty according to 

Nu 

^p,s = (^l.stat + Y ^P<k- (6) 
k=l 

Asymmetric uncertainties quoted for some ZEUS data sets in Refs. [58-61] are symmetrized 
as described in Section 2 of Ref. [43] and references therein. For the case of SLAC data 
the single systematic uncertainty is taken to be fully correlated for all the data points. 

2.3 Observables and kinematic cuts 

The deep-inelastic observables used in our fit are either structure functions or reduced 
cross sections. The neutral current deep-inelastic scattering cross section involving a 
generic charged lepton is defined as 

^^^^ix,y,Q^) = ^-^[Y^Frix,Q^)^Y_xFi^^ix,Q^)-y^F^!^ix,Q^)^ , (7) 
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where 



y± = l±(l-y)2. 



(8) 



In the case of NMC, BCDMS, and SLAC data, we use the quoted value for the structure 
function F2{x,Q'^). For ZEUS and HI data, we use the quoted reduced cross section 
defined as 



dxdQ^ 



{x,y,Q 



(9) 



For charged current deep-inelastic scattering, the measured double differential cross 
section in the case of unpolarized beams is given by 



G'p 



47rx +Q2 



1 

X - 

2 



(10) 



y+F--'^^ {x, Q') T y.xFg'^^'^^ (x, Q') - y^F--'- (x, Q 



± 



,2 77CC,e±, 



As in the neutral current case we use the reduced cross section defined as 



:CC,e± 



{x,y,Q'^ 



47rx \M'+Q 



-1 



2^CC,e± 



d'a 



dxdQ^ 



-{x,y,Q'^) 



(11) 



In the case of CHORUS data, we use the neutrino-nucleon reduced cross section, which in 
the single Vl^-exchange approximation, can be written as 



a-('^)(x,y,Q2) 



x,y,Q^ 



1 dV-(^) 
El, dx dy 
GIMn 

2n{i + QyM^y 



(12) 



2M%x'^y'^ 



± y_ xR 



For all nuclear targets, namely NMC, BCDMS and SLAC deuteron data and CHORUS 
heavy nuclei (mostly lead with a small admixture of iron and other materials), no nuclear 
corrections are applied. 

In order to keep higher-twist corrections under control, only data with > 2 GeV^ 
and W"^ > 12.5 GeV^ are retained. The changes, if any, in the number of data points 
after kinematic cuts for each set are reported in Table [1] between parenthesis. The exper- 
imental data actually used in the present analysis are summarized in Fig. [2l Since the 
kinematic cuts we use are not too conservative, we will supplement our fit with target 
mass corrections, as discussed in Section] 



2.4 Generation of the pseudo-data sample 

Error propagation from experimental data to the fit is handled by a Monte Carlo sampling 
of the probability distribution defined by data. The statistical sample is obtained by 
generating A'rep artificial replicas of data points following a multi-gaussian distribution 
centered on each data point with the variance given by the experimental uncertainty. 
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Figure 2: Experimental data in the (x, Q^) plane used in the present analysis after kinematic cuts. 



More precisely, given a data point Fj'^^^^ = Fj{xp, Qp) we generate k = 1, . . . , A^rep artificial 
points Fj^^^^^^^ as follows 



^(art)(fc) _ g(k} ^(exp) / , V- (fc) , (fc) 

\ 1=1 



Nc 



k = l,...,N, 



rep ) 



(13) 



where 



S. 



(k) 



p,N 



Na 

n 

ra=l 



^ ' ' p,n'-'p,n 



Nr 

n 

n=l 



1 _L 



(14) 



The variables f'^p} ,T\p\r'^)i are all univariate gaussian random numbers that generate 
fluctuations of the artificial data around the central value given by the experiments. For 
each replica /c, if two experimental points p and p' have correlated systematic uncertainties, 

(k) (k) 

then = r\ I, i.e. the fluctuations due to the correlated systematic uncertainties are 



p,i p' 

(k) 

the same for both points. A similar condition on rp_„ ensures that correlations between 
normalization uncertainties are properly taken into account. 

The treatment of normalization uncertainties needs some care: as is well known, in- 
cluding normalization uncertainties in the covariance matrix would lead to a fit that is 
systematically biased to lie below the data [66]. Rather, normalization uncertainties are 
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hixpcrimcnt 


NMO 


NMC"-pcl 


SLAC 








9.0 •10"'^ 
1.000 


1.8 .10"^ 
1.000 


3.1 ■10"'' 
1.000 


1.3 ■lO"^ 
1.000 




/ PC 1 /„(!irt)\ 1 \ 

/<,(cxp)\ 

\ / dat 

\ / dat 


1.5 . 10 
0.0147 
0.0146 
1.000 


4.2 .10 ^ 
0.0170 
0.0171 
0.998 


3.1 ■lO 
0.0104 
0.0104 
0.998 


4.0 .10 
0.0698 
0.0692 
0.999 




/p(oxp)\ 

\ /dat 

\ /dat 


0.033 
0.033 


0.165 
0.176 


0.312 
0.311 


0.470 
0.463 




r ]p 


0.963 


0.988 


0.987 


0.994 




(cov'™P' ) 

\ / dat 

(cov(--')\ 

\ / dat 


6.52 -10"^ 
6.78 ■10"'^ 


4.39 -10-^ 
4.73 .10"^ 


3.07 .10"^ 
3.03 10"^ 


2.90 .10"^ 
2.82 ■lO"'' 




T COV ( ^■''^ ) 


0.989 


0.984 


0.988 


0.999 




Experiment 


ZEUS 


HI 


CHORUS 


FLH108 


Total 


(-^l(-"'")j)dat 
1 


8.5 ■10"'' 
1.000 


1.1 .10"" 
1.000 


1.8 ■lO"^ 
1.000 


1.3 -10"^ 
1.000 


7.1 -10"^ 
0.980 


\ / dat 
\ / dat 


9.6 10 
0.0607 
0.0603 


4.2 -10 
0.0472 
0.0472 


1.8 .10"^ 
0.1088 
0.1109 


6.1 .10"^ 
0.1744 
0.1756 


3.0 -10 
0.0556 
0.0562 




1.000 


1.000 


0.998 


0.999 


0.980 


/p(oxp)\ 

\ / dat 

\ / dat 


0.079 
0.082 


0.027 
0.028 


0.094 
0.096 


0.650 
0.657 


0.145 
0.146 




0.982 


0.952 


0.998 


0.996 


0.996 


\ / dat 
(cov("')\ 

\ / djit 


1.53 ■10-*' 
1.57 .10"^ 


4.93 -10-^ 
5.03 -10"^ 


2.16 -10-^ 
2.31 -lO"-' 


2.03 -10-^ 
2.11 -10"^ 


1.07 10-^ 
1.01 .10"'' 


r GOV ''^^^^ 


0.996 


0.987 


0.998 


0.998 


0.997 



Table 2: Statistical estimators for the Monte Carlo artificial data generation with N^^p = 1000. 
The definition of the statistical estimators is given in Appendix B of [44] . 



included by rescaling all uncertainties, i.e. by constructing for each replica a modified 
covariance matrix: 



COV 



(k) 



pq 



(15) 



with the statistical uncertainties ap^s and each systematic uncertainty ap^i being rescaled 
according to 



a, 



ik) 



p,i ~ '~'p,n'^p,i ' 



^ = 1, 



It can be readily seen that: 



cov 



(fc) 



pq 



COV 



<exp) qik) q(k) 
Pg'~'p,N'^q,N ' 



(16) 



(17) 



and therefore the experimental correlation matrix without normalization uncertainties 
needs to be evaluated only once, while cov^'^^pg is obtained by multiplying by the normal- 

(k) (k) 

ization factors and ^ for each replica. If within an experiment all the sets have 
only a common global normalization uncertainty, the rescaling is an overall multiplicative 
factor. The covariance matrix Eq. (jlSp is that which is used in order to perform a fit to 
the A:-th data replica. 
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Appropriate statistical estimators have been devised in Ref. [44] in order to quantify 
the accuracy of the statistical samphng obtained from a given ensemble of replicas. We 
refer the reader to Appendix B of Ref. [44] for a detailed explanation of the meaning 
of these statistical estimators. Using these estimators, we have verified that a Monte 
Carlo sample of pseudo-data with Nj-^p = 1000 is sufficient to reproduce the mean values, 
the variances, and the correlations of experimental data with a 1% accuracy for all the 
experiments. Results for the estimators computed from a sample of iVrep = 1000 replicas 
are shown in Table [2j This set of Monte Carlo replicas will be used in the rest of this 
paper. 
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3 From parton distributions to physical observables 



In this section we provide all the technical details for the calculation in perturbation 
theory of deep-inelastic observables from a set of initial PDFs. First, we briefly review 
the strategy for the solution of QCD evolution equations in terms of pre-computable 
perturbativc hard kernels K]rj{{x, as{Q'^), as(QQ)), originally introduced in Rcf. [44]. Then 
we review the calculation of the DGLAP evolution factors rij{x, Ug^Q^), as{QQ)) using 
Mellin space techniques, including our prescription for heavy quarks, and give details 
of their benchmarking. Next, we turn to the particular choice of basis for the input 
PDFs. Finally, we describe the calculation of physical observables by combining evolved 
PDFs with the hard coefficient functions (including their target mass corrections) and 
the procedure for obtaining the hard kernels Kij{{x,as{Q^),as{QQ)) for deep inelastic 
observables. 

3.1 Leading-twist factorization and evolution 

The perturbative computation of physical observables involves first evolving the PDFs up 
to the scale of the measurement, and then their convolution with a hard cross-section to 
give the observable. Here this is done following the strategy of Ref . [44] , whereby evolution 
kernels are pre-computed, and then convoluted with parton distributions. This separates 
the numerical computation of the solutions to evolution equations from the computation 
of input parton distributions. The advantage of this is that each of the two computations 
can be optimized separately from a numerical point of view: in particular, we can thus use 
a Mellin-space approach to solve evolution equations, but adopt x-spacc paramctrization 
of PDFs. Also, evolution kernels can thus be pre-computed, benchmarked, and stored for 
future use during the fitting procedure. 

The basic ideas behind this technique were discussed in [44]. The extension from the 
nonsinglet structure function, with only one PDF, to a number of different deep inelastic 
structure functions and reduced cross-sections, expressed in terms of several singlet and 
nonsinglet PDFs, is in principle straightforward, but in practice complicated by a number 
of subtleties which will be discussed as they arise. 

Deep inelastic observables Fi{x,Q^) (which may be structure functions or reduced 
cross-sections) may always be expressed at leading twist as a convolution of parton distri- 
butions fj{x, Q^) and hard coefficient functions Cij{x, as{Q'^)), computed in perturbation 
theory: 

Fi{x,Q^) = ^C7,(x,a,(g2)) ^ fj{x,Q^), (18) 
j 

where denotes the convolution 

fix)^g{x)^ ['^f{y)g(^), (19) 

Jx y \yJ 

and the indices / and j run over observables and parton distribution functions respec- 
tively. The scale dependence of the parton distribution functions is in turn given by the 
renormalisation group, or DGLAP equations 

Q^-^fiix,Q^) = T.Pijix,o,s{Q^)) <^ fj{x,Q^), (20) 
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where Pij are the AltareUi-Parisi sphtting functions, also calculable in perturbation theory. 
The solution of these coupled integro-differential equations may be written as 

Mx, Q2) = ^ r,,{x, a„ aO) ® fj{x, Q^), (21) 
j 

where fjix^Q^) are the input PDFs, to be determined empirically, rjj(x,as,ag) are the 
evolution factors, and we use the shorthand notation 

a, = as{Q^), a'i = as{Ql). (22) 

The evolution factors also satisfy evolution equations: 

d 

Q^-^^ijix,as,a°) = '^Pik{x,as) T^jix, Us, a^^), (23) 

k 

with boundary conditions Tij{x, a^, a^) = 6ij6{l — x). 
Substituting Eq. into Eq. ([18]) 

Fj{x,Q^) = y2Cij{x,as) (g) rjfc(x,as,Q°) fkjx.Ql) 

jk 

= Y,Kijix,as,a'',)(^fjix,Ql), (24) 
j 

where the hard kernel 

Kij{x,as,a°^) = '^Cik{x,as) rA;j(x, a^, a°), (25) 
it 

may be computed in perturbation theory. 

Performing many nested convolutions is numerically rather time consuming. However 
the hard kernels Eq. (j25l) are independent of the particular set of input PDFs adopted, and 
may thus be calculated once and for all at the beginning of the computation, interpolated, 
and stored. Determining the physical observables given by a given set of input PDFs then 
involves the evaluation of only the one set of convolutions Eq. (j24p . which is relatively 
fast, these being reducible to simple sums. 

3.2 Solving the evolution equations 

The QCD evolution equations are most easily solved using Mellin moments [16,67,68], 
since then all the convolutions become simple products, and the equations can be solved 
in closed form. The problem is thus reduced to the computation of the single Mellin 
inversion integral. Specifically, we define 

Tij{N,as,a^s) = / dxx^~^rij{x,as,a'^) , (26) 
Jo 

where by slight abuse of notation we denote the function and its transform with the same 
symbol. Equation (p3]) becomes 

Q'ir,,(iV,a„a°) = J]7^fc(iV,as)r,.,(Ar,a„a°) , (27) 

k 
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where the anomalous dimensions 7^ j ( A^, ) are the Melhn moments of the sphtting func- 
tions. Expanding perturbatively in powers of as 

7.,(iV,a,) = asjifiN) + a^lfiN) + ■■■, (28) 

where the dots denote higher order contributions. The anomalous dimensions are known 
at LO, NLO [69-74] and NNLO [75,76]. 

Since all the dependence on of the anomalous dimension is through the running 
coupling as{Q'^), and 

= Pias) = -a% - a% + ■■■, (29) 
we may in turn write Eq. (I27p as a differential equation in Og' 

— |— r,^(iV,a„a°) = -^Rik{N,as)rkj{N,as,a°,). (30) 

The matrix Rij = (R-)ij has the perturbative expansion 

R(iV, as) = R(°) (N) + q,r(i) {N) + -- - , (31) 
where in terms of the expansion Eq. (|28p of the anomalous dimension matrix jij = (7)4^ 

R(0)(^r) ^ R(fc)(Ar) ^ 1^ _ y |lR(^-)(Ar). (32) 

Note that Eq. ([27]) truncated at NLO, that is with Eq. ([28]) and Eq. ([29]) truncated after 
the first two terms, is not equivalent to the naive truncation of Eq. ()3ip after two terms, 
but rather to the complete series with 

R(o)(Ar) = 7(o)(Ar)//3o, RW(Af) = -hiR^^-^\N), (33) 

where hi = Pi/Pq. 

The complete matrix of anomalous dimensions "f, and thus the matrices R are in fact 
almost completely diagonal: all the flavour nonsinglet and valence quark distributions 
evolve multiplicatively, and only the singlet quark and gluon actually mix. Thus we only 
need to solve Eq. (1301) for one by one and two by two matrices. 

Consider first the simplest case of the evolution of flavour nonsinglet and valence quark 
distributions: the evolution factor then satisfies the simple first order equation 

^ rNs(iV,a.,a°) = -i?Ns(A^,as)rNs(iV,as,a°). (34) 



dlna^ 

At LO the solution is trivial: 



a. 



-R'- 



(0) 
NS 



rNS,Lo(iV,a„a^) = (^) , (35) 
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while at NLO we need to work a little harder: using Eq. (j33p one finds 



NS.NLO 



{N,as,a^) = exp 



a. 



a. 



(0) 



bi Vl + ^i"s, 

This exact solution is equivalent up to subleading terms to the linearized solution 



(36) 



-XX 



(0) 



(37) 



which is in turn the exact solution to Eq. with i?NS = -^ns + o^s-^ns- 

Turning finally to the singlet sector, we need to solve Eq. (j30p when R are two by two 
matrices, corresponding to coupled singlet quarks and gluons: 



d 



At LO we can proceed by diagonalization: 

rs,Lo(A^, a°) = UN, as, a°) = e+(iV) 



rs(iV,a„a°) = -Rs(iV,a,)rs(A^,Q„a°). 



-\+{N) 



+ e„(iV) 



-A_(iV) 



where 



7j°; (iV) + (iV) ± y(7j?(iV) - li'JiN))" + 47S?(iV)7^?(iV) 



(38) 



(39) 



(40) 



are the eigenvalues of the two by two matrix Rg''^(A^) of singlet anomalous dimensions, 
and 



e±(A^) = ±- 



(41) 



"A+(iV) - A_(iV) 

are the corresponding projectors. 

The full NLO solution is more complicated, and must be developed recursively as a 
perturbative expansion around the LO solution li{N,as,a^): writing 



rs,NLo(iV, a., a°) = U(iV, Q.)L(Ar, a„ a°)U(7V, a°)-\ 
where \J{N,as) has the expansion 

U(iV, a,) = 1 + a,U(i) (N) + a^U^^) (N) + ■ ■ ■ , 
solves Eq. (jSOl) provided 



3_R('=)e+ e+RWe_ 
+ 



A_i_ — A_ — fc A_ — A+ — A; k 



e+ + e. 



where 



k-l 



r(0)=r(°\ 



rW=rW+J]r(^-)u«. 



(42) 
(43) 

(44) 
(45) 



1=1 
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By solving recursively Eqs. (j44p . ()45p with the NLO approximation Eq. (j33p . the NLO 
evolution factor Eq. ([33]) can be computed. Just as in the nonsinglet case, the exact NLO 
solution may be linearized to give 

riVoCA', a., «°) = MN, as, a°) + asU(^)(iV)L(iV, a„ a^) - a°L(iV, a„ aO)u(i)(A^), 

(46) 

which again is an exact solution to the truncated evolution equation, and equivalent to 
the full solution Eq. ( ([^2]) ) up to subleading terms. 

In what follows we will use the exact solutions Eq. (I36p and Eq. (I42p in order to be able 
to compare our results directly to those of x-space codes (see e.g. [77,78]) which integrate 
the evolution equations numerically. However in the comparison to the data we choose 
the linearized solutions Eq. ([37|) and Eq. (jl6]) . 

Generalization of the NLO solutions to NNLO and beyond is straightforward but 
tedious. 



3.3 Calculating the evolved x-space PDFs 

The x-space evolution factors are obtained by taking the inverse Mellin transforms of the 
solutions obtained in Eq. ([36]) and Eq. (|12|) . For the nonsinglets 

rNs(x,as,a°) = / ^x-^rNs(iV,as,as), (47) 

where C is taken to be the Talbot contour described in ref. [44], which goes around the 
singularities at = 0, —1, —2, .... For the singlets we use instead 

rs(x, a„aO)=/ ^x-^rs(iV,a„aO) =x / ^x-^rs(iV-l,as,«°), (48) 

since now the singularities are at = 1, 0, —1, . . ., i.e. displaced by one unit to the right. 
The contour integrals are evaluated using the Fixed Talbot algorithm [79]. 

However all splitting functions, except the off-diagonal entries of the singlet matrix, 
diverge when x — > 1; this implies that the evolution kernels T{x,as,0!^) will likewise be 
divergent as x — > 1, and must thus be interpreted as distributions. Specifically, we define 

T^^{x,as,a°) = Tf^s{x,as,a°s) - GNs{as,oPs)6{l - x) , (49) 
4+^(x,a„a°) = rs(x,a„a°)-Gs(as,a°)x-i5(l-x), (50) 



where 



-1 

GNs(as,a°) = / (ixrNs(a;,as,a°) = rNs(^^, as,as)U=i) (51) 







Gs(as,a°) = / dxxTs{x,as,a^s) = '^s{N,as,a^s) 
Jo 



\N=2, 



(52) 
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are all finite constants. The convolutions Eq. (j2T[) may then be evaluated as 

Jx y \y 

= ^GNs(as,a°) - y dyr^s{y,as,a^s)^ Mx,Ql) 
+ 

for nonsinglet distributions fi, and similarly 

fs(x,Q2) = Gs(a„aO)fs(x,Q2)+ /■'^rW(y,a„aO)fs(-,Q, 

y \y 

Gs(as,a°) - y dyyrs(y, as,a°)^ fs(x,Qo) 



- rNs(2/, a., a°) (/. Qg) - yfiix, Ql),y (53) 



) , (54) 



for singlet distributions fs, where now all integrals converge and can be computed numer- 
ically, in practice using Gaussian integration as described in ref. [44]. 

3.4 Flavour decomposition and heavy quarks 

The primary quantities fj in Eq. (|20p may be thought of as the 2nf quark and antiquark 
distributions qi and and the gluon distribution g. The singlet quark distribution 

Uf 

^ = ^{m+q.), (55) 

i=l 

and the gluon distribution mix under evolution, as in Eq. (I54p : a sensible basis is fs = 
(S,(jr). For the remaining 2nj — 1 distributions we adopt a basis of charge conjugation 
eigenvectors which each evolve independently according to Eq. (f53]) : a suitable such basis 
consists of the nj — 1 charge conjugation even nonsinglets 

Ts = n+-d+, 

Tg = M+ + (i+-2s+, 

Tis = n+ + d+ + s+ - 3c+, 

T24 = M+ + + S+ + C+ -46+, 

T35 = n+ + (i+ + s+ + c+ + 5+-5t+, (56) 

where qf^ = qi±qi, and qi = u, d, s, c, 6, t are the various flavour distributions, which evolve 
with evolution factor Eji^g, and the Uf charge conjugation odd valence distributions 



V 


= 


+ d- 


+ s 


+ c 


+ b +t , 


V3 


= u~ 


-d~ 


1 








= u~ 


+ d- 


- 2s 


1 






= 


+ d- 


+ fi- 


- 3c 




V24 


= u" 


+ d- 


+ 


+ c- 


-46-, 


V35 


= 


+ d- 


+ fi- 


+ c- 


+ ft- - 5i- 



(57) 
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the first of whicfi (tlic singlet) evolves with evolution factor Fjljg, while the remainder 
evolve with evolution factor Tj^g. At LO all the quark anomalous dimensions are equal: 

Tns'"^ = 7ns'~ = 7ns''' = 7s'J!q' and thus at LO T+g = Tj^g = = Tf. However at 
NLO 7^^g' = 7/J^5^, while all the others are different: beyond NLO all the anomalous 
dimensions, and thus evolution factors, are different from each other. 

We regard the first three flavours u, d and s as "light" : together with the gluon we thus 
have seven parton distributions 5', S, T3, Tg, V3, Vg which are intrinsically nonperturba- 
tive and are thus, at least in principle, to be determined empirically. The parametrization 
of these PDFs will be discussed in the next section. The remaining three flavours c,6 
and t are regarded as "heavy": this means that we assume that the six parton distribu- 
tions Ti5, T24, T35, ^15, V24, V35 have a component which may be computed perturbatively. 
Of course, it is in principle possible to also introduce nonperturbative (or "intrinsic") 
contributions to these quantities. 

In this paper we use the zero mass variable flavour number (ZM-VFN) scheme to 
incorporate the effects of the heavy quarks. In this, the simplest heavy quark scheme, the 
number of virtual flavours in the [3 function and anomalous dimensions changes abruptly 
at the heavy quark thresholds: this means that while the PDFs are continuous, their 
scale dependence is discontinuous. Thus for example, when computing r(A?^, cts, Q;°) for 

> m^, we write 

r(iV, as, a°) = T{N, as, a^)r(iV, a^, a°), (58) 

where a^ = as(m^), and compute the two factors on the right hand side with rij = 4 and 
Uf = 5 respectively. This scheme neglects terms above threshold which are proportional 

to powers of where rrih is the mass of the heavy quark, thereby losing accuracy for 
scales close to the thresholds. 

The heavy quark distributions themselves are assumed to be zero below threshold, and 
then generated radiatively above threshold. Consider for example the charm distribution. 
For simplicity we take Qq = m^. For < m^, = 0, so Tis = S, V15 = V, while for 

> ml, Ti5 and V15 evolve as nonsinglet distributions: 

Ti5{x,Q^) = r+g(x,a„a°)0S(x,g2), (59) 
Vi5{x,Q^) = r^^s(a:,a.,a°)0y(a;,Q2). (60) 

The difference between Ti5(a;, Q^) and the quark singlet S(x, Q^) for < < gives 
the charm distribution c"*". Similarly c~ is given by the difference between ^15(3;, Q^) and 

V{x, Q"^): however since at NLO F^g = F](f^, F15 = V and c = c. 

For the h distribution the situation is a little more complicated since now > Qq. 
Assuming now that 6=*= = for Q"^ < rn^, we have for Q'^ > m? 

T2i{x,Q^) = T^^{x,as,a\) ®T.{x,ml) 
= rNs(^'«s'"s) ® 

(Tf{x, ala'i) ® E(x, QI) + Tfix, a^, a°) ® g{x, QD) . (61) 

It is thus convenient to define the evolution factors 

r^g^'«(7V,«„aO) = r+g(7V,«„a^)r^g«(7V,«^,aO), (62) 
r|^'^(7V,a„aO) = r+g(7V,a„a^)r|^(Ar,a^,aO). 
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Note that these are inverted and convoluted using the singlet formulae Eq. (08]) and 
Eq. since at least a part of the evolution is singlet. Below threshold, i.e. for < ml, 
Eg ''^ = Eg'', Eg = Eg^. Similarly, for evolution of the valence contribution V24(x,Q^) it 
is convenient to define 

E2fs(iV,a„aO) =Ej,s(iV,a„a^)Ej:js(iV,a^,aO). (63) 

above threshold, with E^l. = Ej:js below. At NLO, for ah E^l. = E^g = EJIjg, thus 
V24 = V and b = b. 

Precisely similar considerations apply to the top distribution: for > we define 

Ef'«(Ar,a,,a°) = E+g(iV, a„ a*)Eg«(Ar, , a"), (64) 
Ef'^(Ar,a„a°) = E+g(iV, a,, a*)Eg^(7V, a* , a°). 
E3fg(Ar,a„aO) = E^;jg(Ar, a„ a*)Ej:jg(iV, a*, a"), (65) 

to give the evolution of Ts^ix, Q^) and Vs^ix, Q^). For < Ef = E|^ vf'^ = Tf, 
Efg = EJ^g, and at NLO for ah Q"^, Efg = E^g = EJ^jg, V35 = and t = t . Note however 
that all the data we currently use to determine the PDFs are actually below the top 
threshold. 

3.5 Practical implementation and benchmarks 

The solution of the evolution equations through the determination of x— space evolution 
factors, Eqs. ([53|) and ([Mj) . is particularly efficient because of the universality of the 
evolution factor, i.e., its independence of the specific boundary condition which is being 
evolved. This means that the evolution factors can be pre-computed and stored, and then 
used during the process of parton fitting without having to recompute them each time [44] . 

During PDE fitting, a given PDE set must be evolved many times up to the fixed 
values of {x,Q'^) at which data are available. It can be seen that for each {x,Q'^) the 
numerical determination of the right-hand side of Eqs. (|53l54p involves the evaluation of 
two contributions: the first requires the multiplication of the PDE by a (predetermined) 
constant {G{as,a^) — /o^dy E(?/, a<j, a^)) while the second requires a convolution of the 
(predetermined) evolution factor T{y,as,C(^) with the (subtracted) PDE, and thus the 
numerical evaluation of the integral over y. To perform this numerical integration we 
use A'^quad^point gaussian integration in each of the 2^''<=''"^^ — 1 intervals in which the 
integration range (x, 1) of y is divided. The total number of points used to perform the 
convolutions in y of Eqs. (j53|) and is then given by 

Apt = Aquad (2"^"=' + ^ - 1) , (66) 

and we determine the values of y accordingly, for each given value of x. We find that 
-^quad = 4 is precise enough for all applications, and we discuss below in detail the choice 

of Aiter- 

The accuracy of our PDE evolution code, described above, has been cross-checked 
against the Les Houches PDF evolution benchmark tables [41,80]. Those tables were 
obtained from a comparison of the HOPPET [78] and PEGASUS [67] evolution codes, which 
are x— space and A— space codes respectively. In order to perform a meaningful compar- 
ison, we use the iterated solution of the A— space evolution equations (see Eqs. (pHI) and 
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Table 3: Comparison of the accuracy of our PDF evolution with respect to the Les Houches 
benchmark tables for different PDF combinations at NLO in the ZM-VFNS. We show results for 
two values of Alitor , which define the number of points over which the gaussian integrations are 
performed, as discussed in the text. 

(j42p ). and use the same initial PDFs and same running coupling, following the procedure 
described in detail in Ref. [41,80]. 

We show in Table[3]the relative difference ej-ei for various combinations of PDFs between 
our PDF evolution and the benchmark tables of Refs. [41,80] at NLO in the ZM-VFNS, 
for two different values of A/jter, Eq. ([66]) . In the upper part of the table we show a very 
accurate evolution to prove the correctness of our technique, with A^iter = 6, that is, with 
approximately 500 points used to perform the convolution integrals. As we can see, this 
choice leads to an accuracy which is enough to reproduce the Les Houches tables with 
O (10-^) precision for all values of x, which is the nominal precision of the agreement 
between HOPPET and PEGASUS. 

In the lower part of Table [3] we show the accuracy results for the actual parameters 
which are used in the neural network fit. We take A'iter = 4, i.e. integration with 128 
points, since this is enough to reach an accuracy of O (lO-^ — 10-^) in the region of x 
relevant to the available experimental data. Such an accuracy is enough for practical 
purposes, considering the typical sizes of both experimental and theoretical uncertainties. 
The use of a smaller number of points to compute the convolutions allows a much faster 
evolution, advantageous in the context of a PDF fit. 
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We also checked the hnearized solutions Eqs. ()37p and (j46p against PEGASUS, and ob- 
tained a similar level of agreement. 



3.6 Parametrization of input PDFs 

The non-perturbative input to the present analysis are five PDFs, parametrized with 
neural networks, at a fixed initial evolution scale, which we choose to be = 2 GeV^. 
PDFs at higher values of are then determined by perturbative evolution, as discussed 
in Sec. 13.21 and Sec. 13.41 

The most unbiased approach in a PDF analysis would be to parametrize all seven inde- 
pendent light PDFs at the initial evolution scale Q\. However, since the experimental data 
sets which are used in the present analysis give very little constraint on the strange PDFs, 
we choose for economy to independently parametrize only the gluon and the four lightest 
quark flavours, n, n, d, d, and fix s and s through two constraints. Also, we determine all 
heavy quark PDFs from perturbative evolution, thereby neglecting intrinsic heavy quark 
contributions. 

We have the flexibility to select any basis for our PDFs: the neural nets have suffi- 
cient flexibility to accommodate any reasonable choice. With the standard PDF fitting 
framework, this is not necessarily the case since specific functional forms are chosen so 
that at least some parameters have a physical interpretation: well known examples are 
the large-x parameters which are related to counting rules, and the small-x exponents 
typically inspired by Regge theory. Therefore, in standard parametrization the choice of 
a specific basis is likely to affect the form of the results [81], whereas we will be able to 
verify explicitly in Sect. 15.41 the independence of the parametrization of our result. 

The specific basis we choose at Qo is given by the following linear combinations: 

• the singlet distribution, = YTiL\ {Qi{x) + Qi{x)), 

• the total valence, V{x) = Y^^£i (Qii^) ~ Qii^)), 

• the non-singlet triplet, T'j,{x) = (u{x) + u{x)) — [d{x) + d(x)), 

• the sea asymmetry distribution, As{x) = d{x) — u{x) = ^{Vs{x) — T3(x)), 

• the gluon, g{x). 

For the strange quarks we make two assumptions: 



We set the constant Cg, the ratio between strange and non-strange sea, to the value 
Cs = 0.5, which is approximately equal to the relative size of the respective contribution 
to the nucleon momentum. Recent dimuon data [82, 83] tend to favor somewhat smaller 
values of the momentum fraction carried by strange quarks [32], however the choice to fix 
Cs at the ratio of momentum sum rules is per se arbitrary and it should be understood as 
a rough approximation. 

The assumption that all heavy quarks are generated radiatively, as described in Sec. l3.4t 
is implemented by taking c(x) = c{x) = b{x) = b{x) = t{x) = t{x) = at the initial 
scale Qq. The vanishing of intrinsic heavy flavour contributions should be taken as an 




(67) 
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approximation, justified by the fact that the intrinsic charm contribution is hkely to be 
small [84], and it is almost unconstrained by the data in our fit. The assumption of using 
only five independent parton distributions is thus a source of theoretical uncertainty. As 
for all theoretical uncertainties, the only way of accurately assessing its impact is to study 
how results change when a more accurate theory is used. 

On top of the constraints from experimental data, PDFs have to satisfy a set of sum 
rules which follow from conservation laws. The sum rules implemented in our analysis will 
be the momentum sum rule, 



/ dx X + g{x)] = 1 , 
Jo 



(68) 



and the valence sum rules 

"1 .1 

dx {u{x) - u{x)) = 2 , dx {d{x) - d{x)) = 1 . (69) 
Jo 

Note that once the sum rules are satisfied at the initial evolution scale Qq, they will be 
satisfied for any other values of . The implementation of the sum rules in our approach 
will be described in Sec. 14.11 

3.7 Hard cross-sections and physical observables 

To determine the input PDFs we must not only be able to evolve them to a particular 
scale, but we must then compute physical observables to compare to experimental data. 
This involves convolution of the evolved PDFs with hard coefficient functions Eq. ()18p . As 
explained in Sec. 13. H this may be done most efficiently by pre-computing the hard kernels 
Eq. ()25p . which can then be convoluted with the initial PDFs as in Eq. (124p . In Mellin 
space 

i^7,(A^,Q„a°) = J^C/fc(iV,a,)rfc,(iV,a„a°), (70) 

k 

SO the most efficient procedure is to compute Kjj{N,as,a^), and invert the Mellin trans- 
form using formulae corresponding to Eqs. ()47l I48p . i.e. 

Kj-ixa aO._^Ic^^-'''KJ,iN,a.,a% if j=T,V, 



The convolutions Eq. (p^ can then be performed using formulae analogous to Eqs. (j53|54p : 



writing Fj{x, Q^) = ^ • Fjj(x, Q^), for the nonsinglet contributions (i.e. j = T, V) 



Fij{x,Q^) = (^Kij{as,a°s) - dy Kij{y,as,a°s)^ fj{x,Ql) 

+ £^i^,,(y,a„«0) (^,g2^ -y/,(x,Q2),^ . (72) 
while for the singlets (i.e. j = S,^) 

Fij{x,Q'^) = (^Kij{as,a^s) - dyyKij{y,as,a^,)^ fij{x,Ql) 

+ J^^Ki,iy,as,a^s) (z.' (^,Qo) -y'/,(x,Qg)) , (73) 
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where 



J^dxxKij{x,as,a'^s) = Kij{N, as,a^s)\N=2 



if j = T,V, 



(74) 



are all finite constants. The convolutions in Eqs. (j72l73p are evaluated in precisely the 
same way as those in Eqs. (j53|54p . i.e. as described in Sec. 13.51 with all the kernels pre- 
computed. It remains to give expressions for the Mellin space kernels Eq. (j70p . There 
are very many of these, roughly the number of observables times the number of PDFs. 
Complete expressions for all the observables and PDFs used in our current analysis may 
be found in Appendix lAl 

Structure functions computed using the kernels of Appendix [X] can be compared to 
experimental data directly, or after having been combined into reduced cross-sections as 
discussed in Sect. 12.31 with the only addition of target-mass corrections, to be discussed 
below. Besides direct experimental information, a further constraint on the input PDFs 
comes from the requirement of positivity. Indeed, even though, as well known, PDFs are 
not positive-definite beyond LO, cross sections must remain positive, and this constrains 
the set of admissible PDFs [85]. The implementation of positivity constraints is nontrivial, 
because in principle one should require positivity of all observables, regardless of the fact 
that they are measurable in a realistic experiment. In practice, we will only impose a 
positivity constraint which has an immediate implication on the admissible gluon distri- 
bution. Namely, we impose (in a way to be described in Sec. 14.21 below) positivity of the 
longitudinal structure function Fl{x,Q'^) for > Qq and x > 10~^. This has the effect 
of vetoing gluon distributions which become too negative at small x, though a negative 
gluon remains allowed. Imposing such a constraint for even smaller values of x is delicate 
since Fl has a perturbative instability in this region, which could only be cured through 
small-x resummation (see Ref. [86] and references therein). 

3.8 Target mass corrections 

We compute all physical observables using leading twist perturbation theory, and higher 
twist corrections are kept under control by our choice of a relatively high kinematic cut, 
as discussed in Sect. 12.31 However, we do include Target Mass Corrections (TMCs) up to 
twist four, since these are of purely kinematic origin and can be determined exactly [87]. 
The implementation of TMCs in the present analysis is different to that in [44]: here we 
rearrange the TMC so that it is explicitly factorised into the hard kernel, and can thus be 
pre-computed along with the perturbative evolution and coefficient functions. 

To see how this works, consider first the structure function F2{x, Q^). From Eq. (4.19) 
of Ref. [87], F2 at twist four is given in terms of the leading twist F2 by 




(75) 



where 



r = 1 + 



4M^x2 



2x 



(76) 
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where Mjq is the mass of the target, and 

Taking MeUin transforms with respect to ^: 

^ c 



(77) 



(78) 



while 



tN fl 



N 



dz 



1 

iV 



F2(iV-l,Q2), 



^'"^ F(N 1 O^W W ^^"^ 



so 

/2(e,Q' 

Now, by substituting Eqs. ()78|79p into Eq. (fTSl) we obtain 
dN 



F2{N,Q^). (79) 



-AT 



737^ ^ Q2 ^r2(iV 



^)E^2,i(iV,as)/,(iV,g'). (80) 



We can reinterpret the factor in front of C2.j{N ^ag) as the new target mass corrected 
coefficient function: 



C2j(iV, as,r) 



4r3/2 



1 + 



3(l-r-i/2) 



+ 1 



(81) 



The target mass corrected hard kernel is then simply 

Kf,M^ as,a',) = Y, [ ^ r''C2,k{N, as,T)Tkj{N, 



as, a°). 



(82) 



The same procedure can be applied to find the target mass corrections to the F^ and 
Fl structure functions. For F3, from Ref. [87] we have 



FsitQ 



2, _ xF3(e,g') , 4M2, X 



2 ^2 />! 



+ 



-Fsiz,Q' 

z 



whence we deduce the target mass corrected coefficient function 



(83) 



C73j(iV,a„r) 



2r 



1 + 2- 



N 



(84) 
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and thus K^j{^,as,a'^) using an equation analogous to Eq. ([82|) . Finally, from Ref. [87] 



whence 

CL{N,as) = CL{N,as) + 

(l + rV^)^(l-T) (3-r)(l+rV^) I \^,^ , 

47^7^ 1^' 1? FTlj^^^'^'"^^- ^''^ 

Note that in the limit Mj^/Q^ 0, t ^ 1, ^ ^ x, Cij{N,as,T) Cij{N,as), and 
Kjj{(^, Us, Ug) — > Kjj{x, as,a^) for each of / = 2, 3, L. 
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4 Neural networks and fitting strategy 



In this Section we discuss the parametrization used to represent the parton densities at 
the initial scale, the training (i.e. fitting) strategy used in our analysis, and the method 
used to determine the best fit. 

Our approach to the parametrization of PDFs is rather different from that which is 
most commonly adopted. Instead of choosing an optimized basis of functions with a rela- 
tively small number of physically motivated parameters, our PDFs use an unbiased basis 
of functions (provided by neural networks), parametrized by a very large and redundant 
set of parameters. As a consequence, the determination of the best fit form of the func- 
tions which give the PDF is not trivial since it is not just given by the absolute minimum 
of some figure of merit. Indeed, a redundant parametrization may accommodate not only 
the smooth shape of the "true" underlying PDFs, but also the random fluctuations of the 
experimental data about it. In fact, it is the possibility of further decreasing the flgure 
of merit which guarantees that the best flt is not driven by the form of the parametriza- 
tion. The best flt is then given by an optimal training, beyond which the flgure of merit 
improves because one is fltting the statistical noise in the data. 

This raises the question of how this best flt is determined. We do this through the 
so-called cross-validation method [46], based on the random separation of the data into 
training and validation sets. Namely, the PDFs are trained on a fraction of the data and 
validated on the rest of the data. A stopping criterion for the whole process emerges when 
the quality of the fit to validation data deteriorates while the quality of the fit to training 
data keeps improving: this corresponds to the onset of a regime where neural networks 
start to flt random fluctuations rather than the underlying physics. 

As explained below, fitting the neural networks to the data is performed by minimiza- 
tion of a suitably defined figure of merit. This is a complex task for two reasons: we need 
to find a minimum in a very large parameter space, and the figure of merit is a nonlocal 
functional of the set of functions which are being determined in the minimization. Care- 
fully tuned genetic algorithms turn out to provide an efficient solution to this minimization 
problem. 

In summary, the main ingredients of our fitting procedure are: 

1. Neural network parametrization, in order to have a flexible, redundant parametriza- 
tion of PDFs. 

2. Genetic Algorithm minimization, which allows an efflcient minimization on a large 
parameter space. 

3. Determination of the best flt by cross-validation, in order to determine the smooth 
physical law which underlies statistical fluctuations. 

We shall now discuss each of these aspects in turn. The results of the application of the 
method to the construction of the NNPDFl.O parton set and in particular tests of its 
stability will then be discussed in Section [5l in particular Section 15.41 

4.1 Neural network parametrization 

Each of the independent PDFs in the evolution basis introduced in Section [3T6l (S. V, T3, A^, g) 
is parametrized using a multi-layer feed-forward neural network [46] supplemented with 
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a polynomial preprocessing; this procedure is a straightforward generalization of the 
framework used in Ref. [44] to the case of several parton distributions. As explained 
in Refs. [42-44], neural networks provide a very flexible and unbiased parametrization of 
the PDFs, the only theoretical assumption being smoothness. 

The neural networks we use are chosen to have all the same architecture, namely 
2-5-3-1. This corresponds to 37 free parameters for each PDF, i.e. a total of 185 free 
parameters, to be compared to less than a total of 30 free parameters for parton fits 
based on standard functional parametrizations [28,33,40]. This choice of architecture is 
motivated by our previous studies [42,43], where it was found that it is adequate for a fit 
of the full structure function F2(x,(5^), in which both the x and the dependence are 
fitted. It is thus surely very redundant for the fit of a single PDF as a function of x at a 
fixed initial scale. Because the aim is to have a redundant parametrization, we do not find 
it necessary to use a smaller architecture even for parton distributions which are poorly 
known and will thus carry little information, such as the light sea asymmetry Ag. The 
use of a redundant architecture reduces a priori the possibility of a functional bias. Lack 
of bias will be checked a posteriori in Section [5.41 by verifying the independence of results 
on the choice of architecture. 

The neural network parametrization is then supplemented with a preprocessing poly- 
nomial. Large enough neural networks can reproduce any functional form given sufficient 
training time. However, the training can be made more efficient by adding a preprocessing 
step, i.e. by multiplying the output of the neural networks by a fixed function. The neural 
network then only fits the deviation from this function, which improves the speed of the 
minimization procedure if the preprocessing function is suitably chosen. 

We thus write the input PDF basis in terms of neural networks as follows 



The values of the preprocessing exponents m and n for each PDFs are summarized in 
Table HI They are chosen by comparison to the result of available fits [28, 33, 40] based 
on functional forms. Results should be independent of them, provided the training is 
sufficiently long, when they take reasonable values. Example of unreasonable values would 
be those which lead to the divergence of sum rules if the function NNj(x) is constant: 
the neural network should then compensate for the divergence, which eventually would 
happen, but would lead to very inefficient training. This leads to the constraints n <2 for 
the singlet and gluon and n < 1 for the valence and triplet. Independence of our global fit 
on these choices will be discussed in Section 15.41 We have further verified on individual 
replicas that results are stable upon removal of the preprocessing function, provided only 
the length of training is greatly increased. 

For three of the basis PDF parametrizations in Eq. (j87p . namely g, V and A5, we have 
factored out an overall normalization constant. The value of this constant is determined 
by requiring that the valence and momentum sum rules Eq. (j69p and Eq. (j68p be satisfied. 
The valence sum rules fix the value of the total valence and sea asymmetry normalizations 



As(x,Q2) 
9{x,QI) 



(1 -xr^x-"^NNs(2;) , 
^V'(l-x)™^x-"^NNy(x) , 

(1 -x)™^3 2;-"^3NNt3(2;) , 



^As(l - x)™^sx-"^sNNAg(x) 
^(l-xr«x-"«NNg(x) . 



(87) 
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Table 4: The preprocessing exponents used in the present analysis, defined in Eq. (IST]) . 



to be 



J^dx [{1- x)"^vNNvix)/x'^v] 

1 - Jq dx [(1 - x)'"^3 NNt3 (x) /x^^s ] 



Ij^dx [(l-x)™^sNNAg(x)/x"^s] ' 
while the momentum sum rule constrains the normalization of the gluon density 

1 - /o dx X [(1 - x)™5:nNe(x)/x"5^] 



dx X [(1 - x)'"9NNg(x)/x"9] 

The integrals are computed numerically each time the parameters of the PDF set are 
modified. We demand an accuracy of 0(10"^) for these integrals, enough for practical 
purposes, so this is the accuracy to which the sum rules will be satisfied. 



4.2 Genetic algorithm minimization 

As extensively discussed in Ref . [44] , the fitting of the neural networks on the individual 
replicas is performed by minimizing the error function 

^dat . V / ij 

where the value F^^^^^ of the observable corresponding to the z— th data point is computed 
from the PDFs as discussed in detail in Section [3l The covariance matrix used for the 
minimization is defined in Eq. (jlSp . 

Due to the non-local nature of the error function (I90p and the complex structure of 
the parameter space genetic algorithms turn out to be the most efficient method for its 
minimization. The procedure we adopted follows closely that of Ref. [44], to which we 
refer for a general discussion, while here we concentrate on improvements introduced in the 
present work. The first of these is that we allow , for each PDF j = 1, . . . , A^pdf > different 
values of the mutation rates rjij, i = 1, . . . , iVmut- This is motivated by the fact that each 
PDF functionality is different, and thus best approached using a specific learning rate. 

Furthermore, all mutation rates are adjusted dynamically during the fitting procedure 
as a function of the number of iterations A'ite 

^M = ^SKt: • (91) 
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A/" max 




A^cop 




-^update 


[10,1] 


[10,1] 


[1,0.1] 


[1,0.1] 


[1,0.1] 


5000 


1/3 


120 


3 


10 



Table 5: Parameters controlling the genetic algorithm minimization. Since we work with 
-^mut = 2 there are two entries in each column for the values of r]^^\ 

As the algorithm gets closer to the minimum, large mutations become more likely to 
increase the value of the error function: they would then be rejected thus making the 
algorithm highly inefficient. This is prevented by the reduction of the mutation rate as 
the minimization proceeds. 

The initial values of the mutation rates for each PDF are collected in Table [5l together 
with the other parameters which control the genetic algorithm. It has been found that the 
choice of two mutations per PDF, A^^^t = ^rmit = 2, is optimal. To see how this works, 
consider for instance the neural network NN^ for the singlet PDF. This is trained with a 
genetic algorithm with two mutations which are initially set to ?7i^s = 10 and r/2,s = 1- 
Both mutation rates then decrease as l/Nne^f^. The learning rates for the remaining 
PDFs are given in Table [5j 

At each iteration of the genetic algorithm we generate A'^cop copies of the PDF param- 
eters, and perform A'mut x A'pdf mutations on each of the copies. The copy which yields 
the lowest value of the figure of merit Eq. (I90p is then chosen as a starting point for the 
following iteration. We have found no advantage in using probabilistic methods for the 
selection of the best PDF parameters. This is because we are not looking for the absolute 
minimum of the error function: rather, the training procedure must be stopped at some 
point to avoid overlearning as we shall explain later. The selection of the copy with the 
lowest error is then best suited for our strategy. 

Since we start from a random configuration, and since the neural networks allow for 
great flexibility, it may turn out that some or all the integrals which appear in Eqs. (|88t - 
[89]) are divergent, especially at earlier stages of the fitting. Similarly, some configurations 
may lead to negative values of Fl, as discussed in Sec. 13.71 To suppress these unphysical 
configurations we added a large penalty to the error function Eq. (|90p . which means that 
they are never selected. 

In order to deal more efficiently with the needs of fitting data from a wide variety of 
different experiments and different data sets within an experiment we adopt a weighted 
fitting technique, following our earlier study in Ref. [44]. The aim of the technique is to 
let the minimization procedure converge rapidly towards a configuration for which the 
final \^ is even among all the experimental sets. Weighted fitting consists of adjusting the 
weights of the data sets in the determination of the error function during the minimiza- 
tion procedure according to their individual figure of merit: data sets that yield a large 
contribution to the error function get a larger weight in the total figure of merit. In order 
to avoid any source of bias, however, weighted fitting is only used at intermediate stages 
and it is switched off when approaching the minimum. 

The way weighted fitting is implemented is by minimizing the error function 

= 7^E^'f^dat.i^f , (92) 
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(k) 

where -/Vdat,j is the number of data points of the j—th set and Ej the error function 



defined in Eq. (f90]) but restricted to the points of the j—th. dataset (see Tabled]). This is 

{k 



therefore a weighted version of the original error function E^^^ Eq. (j90p . The weights pf^^ 
are determined as 

\ J-'max / 

(k) . . (k) 

with Ema.x being the highest among the Ef' at the given GA generation. Their values are 
updated every A'^update generations, with default A'^update = 10 

An important feature of the implementation of weighted training is that weights are 
given to individual data sets, as identified in Tabled! and not just to experiments. This is 
motivated by the fact that typically each data set covers a distinct, restricted kinematic 
region. Hence, the weighting takes care of the fact that the data in different kinematic 
regions carry different amounts of information and thus require unequal amounts of train- 
ing. 

This procedure poses the problem that different sets coming from the same experiment 
are correlated with each other, as discussed in Sect. 12. ll and these correlations are neglected 
in the evaluation of Eq. ()92p . To deal with this problem, the weighted training is divided 
in two stages. In the first stage the weighted error function Eq. (j92p is minimized. When 

(k) 

the total figure of merit is below a threshold E^^: < insets i weighted training is switched 
off by setting pj = 1, and the unweighted figure of merit Eq. ()90p which retains all 
correlations is then minimized until stopping (convergence). The procedure ensures that 
first, a uniform quality of the fit for all data sets is achieved, and then the fit is refined using 
the correct figure of merit which includes all the information on correlated systematics. 

A final improvement of the minimization procedure makes use of the stopping criterion 
which will be described in detail in Sect. 14. 3[ Indeed, it might happen that the stopping 
criterion Eqs. (|95ll96p is met for one or more individual experiments. If the criterion were 
generally satisfied, the fit would have reached convergence and further training would lead 
to overlearning, i.e. the fitting of fiuctuations. Thus, if the criterion is met by a single 
experiment, in order to avoid overlearning of that experiment, its weight in Eq. (I92p is 
temporarily set to zero, so the experiment is effectively removed from the training set. 
The behaviour of the training and validation error functions for the experiment are then 
monitored and, and if it exits the overlearning regime its weight is restored to a nonzero 
value. 



4.3 Determination of the optimal fit 

We now come to the formulation of the stopping criterion, which is designed to stop 
the fit at the point where it reproduces the information contained in the data but not 
its statistical fluctuations, that is, before the training of the neural networks enters the 
overlearning regime. The criterion is based on the cross-validation method, widely used in 
the context of neural network training [46]. Its application to our case has been described 
in detail in Ref. [44] . 

(i) 

First, the data set is partitioned into training and validation subsets with fraction /^.^ 
and f ! = 1 — f^^ of the data points respectively. The values of the fractions can in general 
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^ ' smear 


^smcar 


<5tr 


l^val 


-E'thrcs 


jymax 


45 


13 


10-4 


10-4 


6 


5000 



Table 6: Parameters controlling the best-fit stopping criterion. 



be different for each experiment. The points in each set are chosen randomly out of the 
total dataset. In our fit, this random partitioning of the data is different for each replica, 
thereby ensuring that on average all the information in the original data set is retained. 
Then, the figure of merit Eq. (j90p or Eq. (j92p is minimized for the points in the training set, 
while the corresponding figure of merit for the points in the validation acts as a control: it 
is computed, but not used for minimization. The minimization is stopped when the figure 
of merit keeps improving for the training set, but it deteriorates for the validation set. 
This behaviour signals the fact that we are fitting the statistical fluctuations of the points 
in the training set rather the underlying physics which is supposed to describe both the 
training and the validation data. 

The implementation of this method here follows closely Ref. [44]. We take the same 
value of the training fraction for all data sets, fl^ = ftr = \- In Section [5.51 we shall also 
consider a lower value /tr = \ and show that it leads to essentially unchanged results. 
The partitioning of data points into training and validation sets is done on each data set 
independently. This ensures that all data sets (and thus essentially all kinematic regions) 
are represented in the training and validation sets for each replica. 

Whereas usually when using a genetic algorithm the figure of merit cannot increase 
during the minimization, with the weighted training algorithm discussed in Sect. 14.21 the 
value of the figure of merit during minimization oscillates due to updating of the weights. 
The wavelength of this oscillation is set by the value of A^updatei i-e- the frequency with 
which weights are updated. In order to avoid spurious stopping induced by these oscilla- 
tions, we apply the stopping criterion to moving averages computed over a given number 
of iterations, namely 

1 * 

(^tr,val(0) = ^tr,val(0 • (94) 

We take as a default value for the averaging ("smearing") N^ra = 45. The value is chosen 
to be unequal to a multiple of the wavelength and larger than two full periods, in order 
to minimize spurious fluctuations. 

The stopping criteria are then satisfied if the averaged training error function is de- 
creasing 

rET-r- X ^\ < 1 - c'tr , (95) 

\^tr{^ — ^smearj/ 

while the averaged validation error function increases 

(^val(i)) 



(^val(^ Asjncar)) 



> 1 + ■ (96) 



The parameters 6tr, (Jvai set the accuracy to which the increase and decrease is required in 
order to be significant. Their value has been determined as 5tr = ^vai = lO'^ by verifying 
that with much larger values (more than an order of magnitude) a significant fraction 
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Figure 3: Training and validation error functions as a function of the number of iterations 
for one of the rephcas in the reference fit. 



of fits never stops, while with much smaller values a sizable fraction of fits stops due to 
fluctuations. Results are unchanged upon moderate variations of the values of 6ti, ^vai- 

A graphical example of how the stopping criterion works in practice is given in Fig. [3l 
where the moving averaged training and validation error functions Eq. (j94p are plotted as 
a function of the number of generation for one particular replica of our reference fit, whose 
training has been artificially prolonged beyond stopping point. Overlearning is apparent 
as a rather small though visible effect: beyond the stopping point the training figure of 
merit keeps decreasing steadily while the validation flattens out and actually rises by a 
small amount. The smallness of the rise is a consequence of the fact that the data set is 
very large and mostly quite consistent with itself. 

In order to avoid unacceptably long fits, when a very large number of iterations iV™^ 
is reached the training is stopped anyway, even if the stopping conditions Eqs. ()95|96p are 
not satisfied. This of course leads to loss of accuracy of the corresponding fits, and it is 
acceptable provided it only happens for a small fraction of replicas. We will verify that 
this is indeed the case in Section 15.11 The full set of stopping parameters is summarized in 
Table [H We have verified that our final fit is stable against variations of these parameters 
as well as those of Table [5l 

The set of neural nets at stopping provides our best fit, but it is otherwise impossible to 
endow the best fit values of their parameters with a physical interpretation. In fact, since 
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T3(X.Q;) 




Trained neural network weights 



Figure 4: Distribution of neural network weights for A'rep = 100 replicas of the PDF 
T^lx, Q^). The plot shows the percentage of weights which take the value given in abscissa. 

the nets are redundant, the values of most of these parameters is unconstrained or zero. 
As an example, the distribution of neural network weights at stopping for 100 replicas of 
the triplet neural network T3{x,Qq) is displayed in Fig. [H The well-balanced distribution 
of weights around zero in Fig. U] shows that the individual neurons in the neural network 
operate in their natural range of sensitivity. 
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5 Results 



We have used the methodology discussed in the previous sections to produce a set of parton 
distributions, which we refer to as the NNPDFl.O parton set. As discussed in Section ESI 
this parton set is based on five independent parton distributions, corresponding to the 
two Ught flavours and the gluon; the strange distribution is assumed to be proportional 
to the light sea according to Eq. (j67p . while heavy flavours are generated dynamically 
using a ZM-VFN scheme, as discussed in Section 13.41 Evolution is performed at NLO as 
discussed in Sections 13.2113.31 with as(M|) = 0.1f9. The heavy quark thresholds are at 
= Qo = \/2 GeV, mi, = 4.3 GeV and nit = 175 GeV. 

We now present the NNPDFl.O PDFs, describe some general statistical features of 
the fits, compare our PDFs to other available parton sets, investigate their statistical 
uncertainties by discussing their stability with respect to changes in some of the underlying 
assumptions, and discuss the theoretical uncertainties related to the perturbative order 
and value of as- We shall then present some results obtained using the NNPDFl.O set 
for DIS observables as well as for a few benchmark LHC observables, and compare them 
to those found using other available sets. The methodology used to obtain estimates for 
central values and errors from our parton set is summarized and compared to that used 
with other sets in Appendix iBl 

5.1 The NNPDFl.O parton set: statistical features 

Our full parton set consists of an ensemble of Nj-^p = 1000 sets of five PDFs. The general 
statistical features of the global fit are summarized in Tables EllTl Here (TL) denotes the 
average number of iterations of the genetic algorithm at stopping, as defined in Sect. 14.21 
14. 3[ The other statistical estimators are defined in Appendix B of Ref. [44] 

The distribution of values of the error function Eq. ([90|) at stopping is displayed in Fig. [5] 
along with the distribution of training lengths. The former appears to be approximately 
gaussian and the latter approximately poissonian, with a long tail which causes a slight 
accumulation of points at = 5000 iterations (see Table [6|) . This may cause a loss in 

accuracy in outlier fits, which however make up fewer than 5% of the total sample. For 
completeness, we also show in Fig. [6] the distribution of values of the of ^ global fit to 



E,^ distribution for MC replicas | Distribution of training lenghts | 




eJ^' Training lenght [GA generations] 



Figure 5: Left: distribution of and of TL'-'^-' among the set of trained replicas. The percentage 
value in given in the y axis. 
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the data obtained by using each individual rephca PDF set, instead of the best-fit PDF 
set, whose shape is very similar to that of the distribution of error functions. 
The features of the fit can be summarized as follows: 

• The of the central fit (obtained averaging over all PDFs in the sample) is about 
half of the average value {E) of the error function Eq. (j90p which measures the 
quality of the fit of each set to its corresponding replica. This is as expected for a 
fit which correctly reproduces a "true" behaviour underlying a set of experimental 
data [42-44], given that replicas fluctuate about the experimental measurements, 
which in turn fluctuate about "true" values. It thus supports the use of this central 
fit as our best fit (see Appendix [B]) . A similar conclusion is obtained by comparing 
the distribution of values of the error function of Fig. [5] with the distribution of 
values of the shown in Fig. [6l 

• The quality of the central fit as measured by its = 1-34 is close to its expected 
value. With several thousands of data points, the statistical fluctuations of 

of order of a few percent; moreover, the inaccuracy of the NLO theory used here is 
likely to contribute 5-10% percent to the value of x^- Therefore, this value of 
might indicate some minor data inconsistency. We shall come back to the issue of 
the self-consistency of the data in Sect. 15.51 

• The quality of the flt to individual experiments is fairly uniform. The only 
grows somewhat larger for the NMC experiment, which is known to have consistency 
problems [28,43]. 

• The uncertainty of the flt, as measured by the standard deviation (cr) is rather 
smaller than that of the experimental data: 0.014 versus 0.055. Correspondingly, 
the correlation (p) between pairs of data points is larger by almost the same factor 
in the fit than in the experimental data. This can be either a consequence of the 
fact that the fit is biased by an exceedingly restrictive parametrization, or that it 
is reproducing an underlying physical law which the data consistently follow [28,42, 
43]. We will show in Sect. 15.41 that the former possibility is strongly disfavoured. 
Therefore, we conclude that the latter is the case: this reinforces the conclusion that 
our central fit provides an optimal best fit to the data. 

5.2 The NNPDFl.O parton set: results 

The NNPDFl.O set of parton distributions at the starting scale Qg = 2 GeV^ is displayed 
in Fig. [7] (singlet sector) and Fig. [8] (valence sector), as a function of x both on a linear 
and a logarithmic scale. The PDFs shown are those which we adopt as a basis set, given 
in Eq. (I87p . from which other PDFs can be obtained by linear combination. Relative 
uncertainties on these PDFs are shown in Fig.[9l Results are compared to those of the other 
parton sets CTEQ6.5 [30], MRST2001E [33] and Alekhin02 [88], which are the most recent 
NLO sets from the respective groups available through the LHAPDF interface [89,90]. All 
uncertainties in these plots (including those of other parton sets) correspond to nominal 
one-c7 error bands. In Fig. [10] we also show the shape of individual replicas and the error 
band computed for two sets of 25 and 100 replicas of the gluon PDF. 
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Table 7: Statistical estimators for the final PDF set with A^icp = 1000 for the total data set. 
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Table 8: Statistical estimators for the final PDF set with A^rcp = 1000 for individual experiments. 

[x' dislribution for MC replicasH 
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Figure 6: Distribution of to the fit to actual data of each replica. The percentage value in 
given in the y axis. 

The values of the momentum and valence sum rules computed from NNPDFl.O using 
Eqs. ()68ti69p are correct to within one part per mille, as expected from the accuracy of the 
numerical integration noted in Sec. 14.11 

The general features of the PDF set can be summarized as follows: 

• Even though individual replicas may fluctuate significantly (see Fig. [TO|l thereby 
showing the flexibility of the neural parametrization, this appears to be due to 
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Figure 7: The singlet and gluon PDF at the starting scale Qq = 2 GeV^, plotted versus a; on a 
log (left) or linear scale (right). 



statistical fluctuations of the underlying replica data: average quantities, such as 
the central value and error band shown in the figure, are smooth, all the more so 
as stability is reached as the number of replicas increases. It is clear from the plot 
that with A'rcp = 100 replicas the results for central values and error bands have 
stabilized already, as we shall discuss in more detail in Sect. [53] below. 

• The central values of all PDFs are in reasonable agreement with those from other 
parton sets, especially in the region where data are available. 

• Even though the uncertainty band on the gluon allows for a negative PDF for x < 
IQ-"^, positivity of Fl holds for x > 10"^ and > Qg. 

• Uncertainties on PDFs in the region where data are available tend to be generally a 
little larger than those of the CTEQ6.5 and Alekhin02 sets, and rather larger than 
those of the MRST2001E sets. Note that the one-o" uncertainty band that we find 
without having to introduce a tolerance is thus comparable to or larger than the one- 
a uncertainty bands obtained by the CTEQ and MRST-MSTW groups with their 
respective tolerance criteria, which amount to an upward rescaling of all experimental 
uncertainties by a factor between four and six. 

• Uncertainties on PDFs in the region where no data are available tend to be larger 
than with any other set: this applies to the singlet and gluon at very small x, to the 
valence and triplet at small x, and to the gluon at large x. 
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• Some aspects of the NNPDFl.O PDFs are unconstrained by data which are used in 
the CTEQ and MRST global fits but not in the data set of Tab. [TJ This includes 
the u — d asymmetry, which in global fits is mostly constrained by Drell-Yan data, 
and the large x gluon, which in global fits is constrained by the large- -Ey jet data. 
Hence, the wider uncertainty bands found for these quantities are likely to be at 
least partly due to the smaller data set used in the present fit. 



5.3 Parton-parton correlations 

Besides the uncertainty bands on the PDFs, it is interesting to also determine the cor- 
relations between different PDFs in a given set: the relevance of these studies for the 
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Figure 9: Relative uncertainty on PDFs. All bands correspond to one a (see Appendix lB]) . 



assessment of the uncertainties on LHC has been recently emphasized in Ref. [91]. The 
determination of the correlation between any pair of quantities depending on PDFs us- 
ing NNPDFl.O is straightforward, as the correlation can be simply calculated using the 
covariance as its estimator. For example, the correlation between the two PDF values 
q{xi,Q\) and q{x2-,Q'\) is determined as 

with all averages performed on the ensemble of replicas. Results for some selected corre- 
lations are displayed in Fig. [TTJ For ease of comparison, we have chosen the same PDFs 
and scales that were studied in Ref. [91]. The pattern of correlations is similar to that 
found and discussed in that reference. 
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Figure 10: Sets of 25 replicas (left) and 100 replicas (right) of the gluon distribution. The 
solid red (dark) lines show the average and one-sigma intervals computed from the given 
sets. 

5.4 Statistical uncertainties: parametrization independence 

As discussed in Section [1.21 an advantage of the NNPDF approach is that various features 
of the PDF set can be assessed using standard statistical tools. In this Section, we assess 
the stability of the fit and the reliability of the error estimate provided with it. 

The main estimator used in this analysis is the distance d[q]. This is defined as the 
difference in values between two different predictions for some quantity q obtained from 
two different ensemble of PDFs, measured in units of the sum in quadrature of their 
uncertainties: thus if two determinations of q have d = 1 this means that their difference 
is equal to the sum in quadrature of their respective uncertainties, (see appendix B of 
Ref . [44] for a more extensive discussion) . We compute this distance for each PDF in turn 
and for the uncertainty on it, at the starting scale Qg = 2 GeV^. Specifically, we quote 
values for the distance in various ranges x, by computing it at ten equally spaced points 
in X in the given range and averaging the results. In order to guarantee accuracy of the 
results, the computation is repeated a large number of times over different sets of replicas 
(of order of a hundred). In practice, since we only have a limited total ensemble of PDFs, 
the repetitions are performed by taking different subsets of the full ensemble. 

At first, we check that the results of the fit are stable when using different subsets 
of 100 replicas out of the full ensemble of 1000 replicas. Results are shown in Table [9] 
for all PDFs and corresponding uncertainties, in two kinematical x regions which are 
schematically identified as a region where the PDF is mostly controlled by data, and a 
region where it is mostly extrapolated. It is clear that the results of the fits are indeed 
behaving as they ought to. It is important to notice that distances listed in the table refer 
to an average over 100 replicas, whose standard deviation is by a factor y^^rep smaller 
than the standard deviation of each of the replicas. Hence, the fact that d ~ 1 for two sets 
of 100 PDFs means that the central values of these PDFs differ by a tenth of the sum in 
quadrature of their standard deviations. It follows that Table [9] also verifies that indeed 
the fluctuation of results obtained as an average over A^rep replicas scales as 1/ y^/V^p as 
it ought to. 

Next, we study the dependence of results on the architecture of PDFs. Specifically, 
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Figure 11: Correlations between PDFs, computed using Eq. (j97p . The correlations shown 
are u — u (upper left), g — g (upper right), c — c (lower left) and g — s (lower right, g 
in abscissa). The labels on the axes are the logarithm to base ten of xi and X2- All the 
correlations are computed at Q = 85 GeV. 

we reduce the architecture from 2-5-3-1 to 2-4-3-1, thereby decreasing the number of 
parameters of each PDF from 37 to 31. Results are given in Table [TOl We find remarkable 
stability: fluctuations are at most at the two-o" level in poorly controlled quantities, such as 
the value of the light quark sea asymmetry in the extrapolation region, or the uncertainty 
on the isotriplet combination in the extrapolation region. This shows that results are 
indeed independent of the number of parameters. 

A more subtle issue related to the parametrization is the choice of preprocessing 
functions, introduced in the relation Eq. (|87p between PDFs and their neural network 
parametrizations. These functions, as discussed in Sect. 14. 1^ are introduced in order to 
speed up the training but should not affect final results. We have thus checked the stabil- 
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Table 9: Distance between results obtained from two different sets of 100 PDFs out of the full 
ensemble of 1000 PDFs. 
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Table 10: Distance between results obtained from a sets of 100 PDFs with neural network archi- 
tecture 2-5-3-1 and a sets of 100 PDFs with neural network architecture 2-4-3-1. 

ity of result upon variation of the preprocessing exponents away from their default values, 
listed in Tabled 

First, in Table. [TT]we display the dependence on preprocessing exponents of the values 
of the total and of the average training length (TL). The table shows that in some cases 
the quality of the fit deteriorates and the average length of training increases significantly 
when the preprocessing exponents are varied. This implies that in these cases a satisfactory 
fit has not been obtained: indeed, in such cases there is a noticeable increase in the fraction 



46 



Valence sector 






Singlet sector 










(TL) 






(TL) 


riTs = nv = 0.1 


1.38 


771 


ns = ng = 0.8 


1.39 


1002 


n-Tg = nv = 0.5 


1.34 


1629 


nj: = ng = 1.6 


1.52 


2287 


rriTo, = my = 2 


1.55 


1186 


ms = nig — 1 = 2 


1.37 


647 


rriT^ = mv = 4 


1.28 


1311 


ms = nig — 1 = 4 


1.41 


1306 



Table 11: Values of the total ^^id of the average training length (TL) when preprocessing 
exponents are varied away from their default values Tab. [4l The value of the exponents which are 
varied is given for each row. 

of replicas which reach the maximum training length of V™^^ = 5000 without achieving 
convergence. Hence, an increase of N^^, or possibly a more efficient or differently tuned 
minimization algorithm would be required to obtain a satisfactory fit. Thus, in these cases 
we expect reduced stability of results: this applies to the case of increase of the small x 
preprocessing exponent n for the singlet, and to a lesser extent to the decrease of the large 
X preprocessing exponent ni for the nonsinglet and valence. 

In Table [12] we show the distance from the default of results obtained as the pre- 
processing exponents are varied. In almost all cases, remarkable stability is found, with 
variation of results well within the 90% confidence level. Even in the case of increase of 
the singlet n exponent mentioned above reasonable stability is found. A notable excep- 
tion is the behaviour of the triplet and the valence upon variation of the large or small 
X exponents. In these cases, the distance is of order ten without significant deterioration 
of the fit quality: this (with 100 replicas) means that central values differ by about 1.4 o" 
in units of the respective standard deviations. We must conclude that in these cases we 
do not find complete independence of results on the preprocessing function, and therefore 
that uncertainties on our quark distributions in the valence region are likely to be underes- 
timated by a factor between one and two. A more faithful estimate of these uncertainties 
would require for example random variation of the preprocessing exponents within the 
range where a good fit quality obtains. However, it should be noted that uncertainties on 
valence PDFs are going to be greatly reduced when Drell-Yan data are included in our 
data set. 

5.5 Statistical uncertainties: dependence on the data set 

Having verified the reliability and independence of our results on the parametrization, we 
perform a series of checks to see how they behave when data are excluded from the fitting 
set. The aim of these tests is to make sure that our results are not fine-tuned to our choice 
of a specific data set, and thus that the same approach can lead to satisfactory results 
with data sets of different sizes and with different features. 

At first, we simply reduce our data set to the much smaller set of Table [T3l with a 
total of iVjat = 733 data points, to be compared to the Vjat = 3163 of the full fit. This 
is the same data set which was used for the benchmark Ref. [41], mentioned in Sect. II. 1[ 
A comparison of the up and antidown PDFs obtained in this way with the ones of the 
reference fit is shown in Fig. [T2j Clearly, as data are removed the uncertainty bands 
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Table 12: Distance between results found with a set of 100 PDFs with the default preprocessing 
exponents Table [TTl and 100 PDFs obtained with a different value of some of the preprocessing 
exponents. The data and extrapolation regions for each PDF are the same as in Table HI The 
preprocessing exponents which are varied are listed in the table, with Uy = rtji, ~ ny, rriy = mx^ = 
mv, Us = riY. = rig and rus = uiy, — mg — 1. 

increase in the region where there is information loss (such as large x for the sea and 
small X for the valence), but central values remain compatible within uncertainties. This 
shows that the parton parametrization with neural nets is flexible enough to accommodate 
results from either of these data sets, and it confirms that results are indeed independent 
of the parametrization. 

This increase in error band when data are removed indicates that the data included 
in the fit are mostly consistent with each other, i.e. they lead to consistent underlying 
PDF sets. Indeed, if there are systematic data inconsistencies, when the inconsistent data 
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Table 13: Reduced data set used for benchmarking. 




Figure 12: Comparison of the NNPDFl.O reference PDFs to those obtained from a fit to the 
reduced data set of Table [T51 The PDFs shown at the starting scale Qq = 2 GeV^ are u{x) (left 
plot) and d{x) (right plot). 



are a small subset of the full data set they will on average be fitted poorly by the best-fit 
PDFs. This is due to the fact that when these data are included in the training set an 
improvement of the quality of the fit to them does not generally lead to an improvement 
of the quality of the fit to the remaining data in the validation set. This behaviour was 
observed in our fits of Ref. [43], when fitting some NMC structure function data which 
are known [28] to be somewhat inconsistent with the rest. If instead two sets of data 
are inconsistent and roughly of the same size, the best fit PDFs will fluctuate between 
those who give a good fit to either of the datasets, according to which data are randomly 
included in the training set of each replica. A mild inconsistency of this kind was recently 
observed in our fits when comparing ZEUS and HI structure function data [92]. 

In neither case will the inclusion of inconsistent data lead to a decrease of the uncer- 
tainty, and in the latter case it might even lead to an increase of the uncertainty, if the 
systematic discrepancy between the two datasets is sizably larger than the uncertainty of 
each of them. In all cases, inconsistencies will be signalled by a large value of the of the 
best fit. The behaviour displayed in Fig. \T2\ together with the low value of the = 1-34 
in Table [71 suggests that only minor insonsistencies are present in the NNPDFl.O dataset, 
although the mild inconsistencies such as those within the NMC dataset and between HI 
and ZEUS datasets mentioned above are likely to be present. 

A more detailed check of stability can be performed by removing data in a fixed 
kinematic region. To this purpose, we repeat the reference fit but with the cut in raised 
from the default Q^^^=2 GeV^ to Q'^^^=10 GeV^. As can be seen from Fig. [21 this removes 
from the analysis a sizable amount of data, leaving A'dat = 2355 out of the A'dat = 3163 of 
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Table 14: Distance between results computed from a set of N^cp — 100 replicas from the reference 
fit and a set of iVicp = 100 replicas from a fit where the kinematic cut in has been raised from 



)2^i„ = 2 GeV2 to Q^i„ = 10 GeV^. 
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Figure 13: Comparison of the reference fit to a fit in which where the kinematical cut in has 
been raised from Qmin = 2 GeV^ to Q^jn — 10 GeV^. The gluon (left plot) and the total valence 
PDFs (right plot) are shown. 

the reference fit. Results found in this case are displayed in Fig. [T3l which clearly shows 
the increased uncertainty in the small x region where data have been removed, and good 
stability in the valence region where the data set is essentially unchanged. 

This is assessed in a quantitative way in Table [T^ by tabulating the distance between 
results in the three kinematic regions (a) where data are available both before and after 
raising the cut (Data/Data); (b) where there are data before raising the cut but not 
afterwards (Data/Extrapolation); (c) where there are no data either with the lower or 
higher cut (Extrapolation/Extrapolation). In most cases we find good stability at the 
90% confidence level: as shown in Fig. [13] whenever central values change significantly, the 
uncertainties increase correspondingly. 



50 







Ext r cipol&t ion 




^ 1 0"'* < T- < 1 


1 < -7- < 1 0"'^ 


{d\Q\) 


2.13 


1.95 


{d\cT\) 


1.17 


1.83 




1 n^* < T < n 1 


±U \ X ^ -LU 


(d\a]) 


1.76 


1.82 


(d\a]) 


1.29 


1.87 


J^3\X, L^oJ 


U.UO ^ X _^ U. / 


lU ^ X ^ lU 


{d[q]) 


1.07 


0.93 


{d[a]) 


1.17 


1.59 




0.1 < X < 0.6 


3 10"^ <x <3 10"^ 


{d[q]) 


1.42 


1.62 


{d[<j]) 


1.56 


1.64 


^s{x,Q''„) 


0.1 < X < 0.6 


3 10"^ <x<3 IQ-''' 


{d[q]) 


0.99 


1.71 


{d[<j]) 


1.35 


1.73 



Table 15: Distance between results computed from a set of N^ep = 100 replicas from the reference 
fit and a set of A'icp = 100 replicas from a fit where the training fraction has been reduced from 
its default ft, = 0.5 to ftr = 0.25. 

An interesting exception is the case of the Data/Extrapolation region for singlet and 
gluon. This is the kinematic region of most of the HERA data: raising the cut corresponds 
to excluding the HERA data from the fit. In this case, even though the central reference 
is within the uncertainty band of the fit with the higher cut (see Fig. [T3|) . the distance 
between central values is rather larger than allowed by statistical fluctuations. This can 
be understood as a consequence of the fact that the behaviour of the gluon at small x in 
the HERA region cannot be predicted by simple extrapolation of the behaviour observed 
at larger x in the NMC region. Similar behaviour was observed in structure function 
fits [43] , though here the problem is alleviated by the fact that for a wide class of starting 
quark and gluon distributions the shape of structure functions obtained from NLO QCD 
evolution is universal [93]. 

As a final test, we reduce the number of data points by randomly removing points from 
the data set. This is simply done by reducing the fraction ftr of points in the training 
set from its default value ftr = 0.5 (see Sect. 14. 3p to ftr = 0.25. As shown in Table [T5l 
results move by at most two-o" as a consequence. Furthermore, in this case uncertainties 
are unchanged: for instance, the value of ('7''°°*'')dat coincides with that of the reference 
fit given in Tab. [8l Hence, the fit behaves as one expects when the data set is changed, 
but with no significant information loss. This indicates that the data set is consistent and 
quite redundant. Of course, if the training fraction were reduced further eventually one 
would have information loss, and one would be led back to the situation of smaller sets of 
experiments or kinematic cuts. 
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Table 16: Statistical estimators for the LO fit and the NLO fits with different values of as (top) 
and distance between results computed from a set of A'rcp = 100 replicas from these fits and set of 
iVrep = 100 replicas from the reference fit. 




X X 



Figure 14: Comparison of the NLO (reference) and LO fit results for the gluon (left) and quark 
singlet PDFs (right). 

5.6 Theoretical uncertainties 

In our determination of PDFs, all systematic uncertainties in the data have been ac- 
counted for in the Monte Carlo data generation, as discussed in Sect. [2] they are then 
propagated through the fitting procedure onto our final results. Therefore, error bands on 
the NNPDFl.O PDFs already include both statistical and systematic uncertainties on the 
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data. On top of these, however, there exists theoretical uncertainties due to the fact that 
we do not fit directly experimental observables, but rather PDFs, which are theoretical 
constructs of QCD, and are thus affected by uncertainties related to incomplete knowledge 
of the theory: we shall refer to these (sometimes called model uncertainties) as theoretical 
uncertainties. 

The main theoretical uncertainties on NNPDFl.O are those related to the fact that 
our analysis is performed at NLO, and thus neglects effects at NNLO and beyond, and to 
the choice of value of the strong coupling as, which is only known to finite accuracy. To 
address these issues, we have repeated our fit at LO, and also at NLO with two different 
values of the strong coupling, as(M^) = 0.117 and as(M|) = 0.121. The results of these 
studies are summarized in Tab. \T6\ where we show the statistical features of these fits, to 
be compared with those of the reference fit Tab. [HI and the distance of the result of these 
fits from those of the reference. 

Comparison of the LO fit to the NLO shows that the quality of the fit deteriorates 
significantly when using LO theory, as one might expect. However, the size of the uncer- 
tainties remains essentially unchanged, as shown by the value of both (o'^'^^*^)^^^^ and of 
d[a]. Furthermore, all central values of the LO fit vary by an amount which is compatible 
with statistical fluctuations, with the exception of the singlet and gluon, which vary by 
about two-cT, as one expects due to the fact that the gluon contribution to deep-inelastic 
coefficient functions only starts at NLO. The change in behaviour of the singlet and gluon 
is displayed in Fig. [HI 

The comparison of the LO and NLO fits also indicates that the theoretical uncertainty 
due to lack of inclusion of NNLO corrections is negligible on the scale of statistical un- 
certainties. This conclusion is based on the observation that NNLO corrections to all 
PDFs are known [36,94] to be a fraction of the typical NLO corrections (i.e. the NLO-LO 
difference) in the nonsinglet sector, which we already find to be smaller by a factor larger 
than two than statistical uncertainties. 

Comparison of the NLO fits with different values of shows a variation in which, 
though small in terms of the expected statistical fluctuations of the itself, is actually 
quite significant in absolute value and thus for the determination of as (see Ref. [95] for 
an explanation of this distinction). The possibility of such a determination as will be 
explored further in future studies. The change in central values of PDFs is compatible 
with statistical fluctuations, except in the case of the gluon, as one expects due to the fact 
that the gluon contribution to deep- inelastic structure functions is of order as- Even in 
this case, however, the change is a fraction of the statistical uncertainty. 

We conclude that the dominant theoretical uncertainties are small on the scale of the 
statistical uncertainties in our fit. If this will no longer be the case in future analyses, 
theoretical uncertainties can be taken into account by varying the underlying parameters 
in a randomized way during the fitting procedure. For example, the uncertainty on the 
value of as could be accounted for by taking it as a random variable distributed about its 
central value with a given uncertainty, and letting it fluctuate between replicas. For the 
time being, however, this does not appear to be necessary. 

Further possible sources of theoretical uncertainty include effects related to large x and 
small X resummation of the perturbative expansion, the treatment and position of heavy 
quark thresholds, higher twist corrections, and nuclear effects especially in the treatment 
of neutrino data. All these corrections are expected to be smaller than those discussed 
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here. They will addressed in future studies as the precision of the analysis improves. 

5.7 Usage and delivery of the NNPDFl.O set 

The statistical features of the NNPDFl.O parton set are those of the full ensemble of 
-^rep = 1000 replicas. However, as we have explicitly verified in Sect. 15. 4[ the scaling 
with the value of A^rep of results obtained from NNPDFl.O works just as one would expect 
on naive statistical grounds, and therefore a smaller number of replicas may be sufficient 
for the computation of quantities whose uncertainty decreases fast enough with A'^^ep. For 
example, as is well known [45], the variance of the mean of a sample of size A'rep is s^/A^rep, 
where s is the standard deviation of the underlying distribution from which the sample is 
taken. Similarly, the variance of the variance itself is (for gaussianly distributed quantities) 
2s^/(A'rep — 1), Therefore, if the standard deviation corresponds to a 10% uncertainty, a 
sample of 100 replicas will be sufficient to determine central values with 1% accuracy and 
the uncertainty itself with 5% accuracy. 

The computation of quantities that depend on more detailed statistical features of the 
sample may be more delicate, and require the use of the full ensemble. An example is 
the computation of correlations: indeed, not only does the uncertainty on correlations 
only decrease as 1/ -^/iVrep, but also the correlation computed with the sample is only 
asymptotically an unbiased estimator, since it overshoots the true one by an amount which 
only vanishes as 1/A^rep- Therefore, the stability tests of Sects. [5^5. 51 were computed using 
sets of A^rep = 100 replicas, but the correlations displayed in Fig. [11] were determined using 
the full set of A^'^ep = 1000 replicas. 

For these reasons, we will provide both the full set 1000 replicas, and a restricted set 
of 100 replicas. Of course, other sets can be extracted as needed from the full set. The 
NNPDFl.O sets can be downloaded from the web site http:/ /sophia.ecm. ub.es/nnpdf/, 
together with instructions for interfacing to commonly used codes. They can also be ac- 
cessed through the LHAPDF library [89]. The computation of central values and errors 
using Monte Carlo PDF sets is briefly summarized in Appendix [Bl 

Given the statistical information contained in the full ensemble of PDFs, it is possible 
to construct optimized sets which consist of a smaller number of replicas, but whose 
statistical properties are closer to those of the full ensemble than those of a random subset 
of it. This can be done by picking, out of the full set, a subset that minimizes a measure 
of the distance between the probability distribution of the subset and that of the full set, 
such as the relative entropy (or Kullback-Leibler divergence) [96]. This construction is 
the subject of current investigation and it will be presented elsewhere. 

5.8 Comparison with present and future experimental data 

A full study of the phenomenological implications of the NNPDFl.O parton set is beyond 
the scope of this work. However, for the sake of illustration, in this section we present 
first comparisons of theoretical predictions obtained using NNPDFl.O with data, both for 
deep-inelastic observables included in the fit and for some LHC observables. 

In Fig. [15]the theoretical prediction obtained using NLO QCD and the NNPDFl.O set 
is compared to the data, for some representative deep-inelastic observables included in the 
data set of TablelH Results obtained using the most recent NLO sets from other groups [30, 
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Figure 15: Comparison of NLO theoretical predictions and data for various observ- 
ables included in the NNPDF fits. First row: proton at = 15 GeV2(left) and 
deuteron/proton ratio F2 / F2 aX = 7 GeV^ (right). Second row: positron-proton NC 
(left) and CC (right) reduced cross sections at = 6.5 GeV^. The value of y is deter- 
mined as a function of x with ^/s = 301 GeV. Third row: neutrino-proton total reduced 
cross-section at = 12 GeV^. The value of y is determined as a function of x with 
Ey = 70 GeV. Theoretical uncertainties are all one-o" bands, while experimental error bars 
are total uncertainties with statistical and systematic errors added in quadrature. 

33, 88] included in the LHAPDF library [89] are also shown. As one might expect, the 
differences between predictions obtained using different parton sets are smaller for these 
observables than they are for the parton distributions themselves (compare Figs. [THE]). 
This is unsurprising given that these data (with the exception of the CHORUS data) have 
been used in the determination of all these sets. 
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In Table [T7] and Fig. [16] we show the total cross sections for W and Z production. 
These processes have been proposed as luminosity monitors at the LHC, but they remain 
sensitive to various aspects of PDFs [41,91]. All cross sections have been computed at 
NLO using MCFM [97-100] , using a sample of A'rep = 100 replicas, which as discussed in 
Sect. 15.71 is fully adequate for this purpose. The results are compared to those obtained 
using the MRST2001E [33], CTEQ6.1 [28] and CTEQ6.5 [30] sets. Note that the values 
given for CTEQ6.5 differ somewhat from those published in Ref. [30], which were however 
obtained using WTTOT [101]; results for MRST2001E and CTEQ6.1 coincide with those 
of Ref. [41], also obtained using MCFM. 
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Table 17: Cross sections for gauge boson production at the LHC. All quantities have been com- 
puted at NLO with MCFM [97-100] The quoted uncertainty is the one-cr band due to the PDF 
uncertainty only. 
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Figure 16: The vector boson production cross sections of Table [171 

We find good agreement of central values with the CTEQ6.1 computation. This is as it 
should be, given that CTEQ6.1 uses a zero-mass variable- flavour number scheme for heavy 
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quarks as we do (see Sect. 13. 4p . The difference between results obtained with CTEQ6.1 
and CTEQ6.5 is explained in Ref. [91] as a consequence of the different treatment of heavy 
quark thresholds. The discrepancy between CTEQ6.5 and MRST2001, which use a similar 
treatment of heavy quark thresholds, is resolved [91] when comparing to the most recent 
MSTW NNLO set [36]. Our estimate of the uncertainties is a little less than that of the 
CTEQ fits, but is rather larger than that of MRST. Note however that unlike CTEQ and 
MRST, NNPDFl.O does not yet include any Drell-Yan data in the determination of the 
PDFs. 
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6 Conclusions and outlook 



We have presented a set of parton distributions, determined using a new method which is 
designed to provide results that are unbiased and amenable to statistical analysis for the 
probability distribution of PDFs. 

The main features of this approach and of the NNPDFl.O parton set based on it are 
the following: 

• The parton set is given as a Monte Carlo ensemble of replicas of the parton dis- 
tributions which form the basis of independent PDFs. This ensemble provides a 
representation of the probability distributions in the space of PDFs, whence any 
statistical property of the parton distributions themselves or of any function(al) of 
them can be calculated using standard statistical methods. It is thus easy to compute 
not only central values and uncertainties, but in fact any desired property such as, 
for example, the correlation between a certain cross section and, say, the momentum 
fraction carried by a particular parton distribution. 

• The parton parametrization, based on neural networks, is extremely redundant and 
independence of results on it can be verified explicitly. It is thus possible to verify 
that the reduction in the uncertainties of the predictions from the fit as compared 
to those of the input data is due to the combination of many consistent data which 
obey an underlying physical law, and not to parametrization bias. A small residual 
dependence of results on the preprocessing in the neural networks can be accurately 
assessed and monitored. 

• A single parton parametrization is flexible enough to reproduce data sets of different 
sizes, as demonstrated by the fact that as data are removed, uncertainty bands 
increase to accommodate the increasingly large fluctuations of central values. This 
in particular implies that uncertainty bands in the region where there are no data 
are determined by interpolation or extrapolation of the fluctuations observed in the 
data region, and not by the form of the parametrization. 

• Results behave according to standard statistical expectations upon changes in the 
size of the Monte Carlo ensemble. This enables the consistent use of samples of 
different size according to the quantity one wishes to compute: for example, a small 
sample is often sufficient for the accurate determination of an average thanks to the 
fact that the variance of the average of quantities is smaller by a factor 1/A^ than 
the variance of each of them. 

• Uncertainty bands are produced directly from the propagation through the Monte 
Carlo and fitting procedure of the uncertainty on the underlying data, and do not 
require the use of any tolerance criterion. 

• The best fit is not determined as the absolute minimum for a given functional form, 
but rather it is found by the cross-validation method of comparing to the quality of 

the fit to a (randomly selected) control data sample. As a consequence, inconsistent 
data or underestimated uncertainties do not require a separate treatment, and are 
automatically accounted for and signalled by a larger than average value of the 
per degree of freedom. 
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The results discussed here provide a first full parton set based on this novel methodol- 
ogy, which can be improved in many respects. The main improvements which we envisage 
are the following: 

• A wider set of data besides deep-inelastic scattering should be included. This will 
improve the accuracy of quantities such as the gluon at large x (from jet data) or 
the light antiquark sea asymmetry (from Drell-Yan data), and it will make possible 
a direct determination of the strange distribution which we did not determine here. 

• Heavy quark thresholds should be treated more accurately, by including terms pro- 
portional to the heavy quark mass, i.e. to -^f-. Because heavy quark distributions 
are generated radiatively by matching in the threshold region, this improves the 
accuracy in the determination of heavy quark fractions at high scale. 

• Complete independence of the preprocessing of neural networks should be obtained 
by randomizing the preprocessing functions, as discussed in Sect. 15.41 Similarly 
contributions from theoretical uncertainties such as the choice of factorization scale 
could be included by randomizing the corresponding parameters, as discussed in 
Sect. \EM 

• A set of LO parton distributions should be produced in view of its use in Monte 
Carlo generators (see Ref. [102]). A set of NNLO parton distributions should be 
produced, both with the purpose of estimating uncertainties on the NLO results and 
also for some precision applications. Sets of partons including large and small x 
resummation corrections should also be considered. 

• An optimized small set of parton distributions should be produced which reproduces 
as closely as possible the statistical features of the full set, as briefly discussed in 
Sect. 15.71 This would then allow the efficient computation of detailed statistical 
features without having to use the full ensemble of a thousand replicas. 

None of these improvements (with the possible exception of the last) requires concep- 
tually new tools: they can all be made by straightforward generalizations of the techniques 
discussed here. Many of them will become relevant only as the quality and accuracy of 
the fit improves. More importantly, it will be possible to monitor this improvement, since 
we now have at our disposal a statistically reliable and accurate PDF fitting tool. 
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A Kernels for Physical Observables 



In this appendix we expand the physical observables — structure functions for DIS from 
protons and deuterons (measured by SLAC, BCDMS and NMC), the reduced cross- 
sections for neutral and charged current DIS from protons (measured by ZEUS and HI), 
and the neutrino cross-sections from a deuteron target (measured by CHORUS) — in 
the basis of PDFs given in Sec. 13.41 Eqs. (|55ll57p . This yields explicit expressions for 
the Mellin transforms of the hard kernels, in the form Eq. (j70p . which after Mellin in- 
version Eq. (j7ip can then be used to compute the observables through evaluation of the 
convolutions (|72l73p with the initial PDFs determined from the neural nets Eq. ()87p . 

The expressions below can be used with evolution kernels and coefficients evaluated 
at any order in perturbation theory. Currently evolution kernels T^'^ and Eg', i,j = q,g, 
described in Sec. [321 can be computed at NLO [69-74] and NNLO [75,76], while the DIS 
coefficient functions C/,i, I = 2,3, L, i = q, g are known at NLO [103], NNLO [104-108] 
and NNNLO [109-112].' Note that at NLO T^^ = ^NS = ^ns = ^nS' <^I,<? = <^/,9' while 
at LO r^g = rj;:jg = rj^g = and C/,g = Cl,, = 0, while C2,q = Cs^g = 1. 

The heavy quark evolution factors Eg"^'*, Eg^'*, i = q, g in the ZM-VENS are given in 
Eqs. (I62|64p . Below the top threshold we can set Eg^''^ = Eg''', Eg^'^ = Eg^, while below 
the b threshold Eg'^''^ = Eg^''^ = Eg''', Eg^'^ = Eg^'^ = Eg^, with some simplification of the 
results. 

Throughout this section we use a condensed notation in which the arguments of fi, 
Fj, T, C and K are all suppressed: parton distributions evaluated at Qq are denoted by a 
subscript 0, e.g. Eq = T,{x,Qq). We will assume throughout that Qq = ml, as explained 
in Sec. l3.4[ We also assume that 6 = 6, i.e. V24 = VaXQ^ = m^, and similarly t = t, i.e. 
V35 = V at Q'^ = rrif. However we do allow for the possibility of intrinsic charm, even 
though for the parton fits described in this paper this option is not exercised. The specific 
flavour assumptions described in Sec. 13.61 can be summarised as Tg^o = "^2+0" ^0 ~'~ 2+c 
(see Eq. (I67D), Tis.o = Sq, Fg.o = ^15,0 = ^o- 

Target mass corrections are included in the kernels using the procedure described in 
Sec. EE 

A.l Structure Functions: SLAC, BCDMS and NMC 

The datasets we include in our analysis include measurements of F2, F2 and the ratio 
F2/F2. These observables can be written in terms of the linear combinations of parton 
densities used in the evolution code (defined in Eq. ([55]) and Eq. ([56|) ): in the quark model 
the proton structure function 

= X ^ ^ql = x{^E + \T, + ^(Tg - T15) + U^^i " ^35)}, (98) 

i=l 

where Cj are the quark charges (| for u,c,t, — for d,s,h), while the deuteron structure 
function 

Fi = \{F^2 + F2) = ^{^S + ^(Tg - ri5) + - Tas)} • (99) 
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In perturbative QCD, we have 

= x{^C2%®s + iC2,,®(r3 + i(r8-ri5) + i(r24-r35)) 

Hel)C2,g®9}, (100) 
Fi = x{^C2%0S + iC2,,®(i(T8-Ti5) + i(r24-T35)) + (e2)C2,g0 5}-(lOl) 

where denotes the Melhn convolution, defined in Eq. (I19p . and (e^) is defined as 

"/ 

{el) = ^X-l (102) 

i=l 

where nj is the number of active flavours: (e^) = |, jg for = 3,4,5,6. We can 

thus write 

= x{ifF2,S®So + KF2,9®50 + ^S^F2,+ ® (r3,o + |(T8,o-Ti5,o))}, (103) 
Fi = x{iVrF2,S®So + KF2,9®50 + KF2,+ »(T8,o-Ti5,o)}, (104) 

where (in Mellin space) 

i^F2,E = ^C2%r^s' + M^2.,(r^''''-rf'') + (e^>C2,,rf, (los) 
i^F2,, = ^c^2%rf + ^C2,,(r2s'''-rf^) + (e')c^2,,rf, (loe) 

i^F2,+ = |C2,,r+g. (107) 

A. 2 Neutral Current Reduced Cross-Sections: ZEUS and HI 

For high energy DIS experiments, as is the case for ZEUS and HI, the contribution from 
weak boson exchange cannot be neglected. 

The neutral current deep-inelastic scattering reduced cross section Eq. Q is 

~NC,e± ^ pNC ^ ^^pNC _ ^pNC ^ (lOg) 
with 1± = 1 lb (1 — y)^. In the quark model 

Fr = F',+Y.B,qi, Ff^ = Y,D,q-, (109) 

i=l i=l 

where F2 is the purely electromagnetic structure function, and the charge factors are 

B,{Q^) = -2e,VeV,Pz + {v! + Al){V^ + A^)Pl, (110) 

Dg(g2) = -2eqAeAqPz + WeAeVqAqPl, (111) 

where Pz = Q"^ /{Q"^ + ), V/ and Af are the vector and axial couplings of the fermions 
to the Z boson, as given in Table [TSl In terms of the PDF evolution eigenstates Eq. (j55ll57p 



Ff"" = x{i?g+s + £;+g(r3 + i(r8-Ti5) + i(r24-r35))}, (112) 

F3^C = Fg-F + F-g(y3+ 1(^8-^15) + i(V^24-F35)), (113) 
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fermions 






Af 


u,c,t 


+2/3 


(+l/2-4/3sin^ 


+1/2 


d,s,b 


-1/3 


(-l/2 + 2/3sin2 0^) 


-1/2 







+1/2 


+1/2 


e,fi,T 


-1 


(-l/2 + 2sin2 Ow) 


-1/2 



Table 18: Coupling of fermions to the Z boson. 



where the charge coefficients 



E- 
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h{E>u + Dd), E^^ = ^{Du-Dd) 



(114) 



In perturbative QCD we can thus write the reduced cross sections as 



^{{Cl, - ^cig) e^j: + Eg{C2,g - ^CL,g) ^ g 

+^C3,, {E^V + E^^iVs + liVs - Fis))), } 
where we have set V24 = V35 = V, and 



Eg = {el) + {B^^, 



where 



{B. 



rif « 



i=l 



g (-Bm + 2i3rf) 
\ {Bu + -Bd) , 



if Uf 

if n/ 



3, 
4, 
5, 
6. 



(115) 
(116) 

(117) 



li2B^ + 3Bd), if n/ 
3(S, + 5rf), if rif 

Note that the gluon coefficient function C2,g is the same quantity which appears in 
Eq. (jlOOp . In terms of hard kernels, we thus have 



x{Kf^C,T: ® So + -f^NCs ® 50 + -^NC,+ (Tsfi + ^(^8,0 - Tl5,o)) 



TKncv ® ^0 + ^Nc,- ® (Vs.o + 5(^8,0 - ^15,0))}, 



(118) 



where in Mellin space 



NC,g 
K-!<iC,V 



2,9 



ici,)E^Tf + £;,(C2,, - ^CL,,n'^ 



24,q 
S 



■i35,q 



2,, - ^C7i,^)i?+rg^ + E,{C,,, - ^C,,,)Tf 



^C,,,)E^,{Tf'^-Tf^^) 



(119) 

(120) 
(121) 
(122) 
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It is also convenient to write down here the expression for which we use to fix the 
positivity constraint and fit the HI Fl data: 

= x{Kfl,j: ® So + Kfl,9 ^ 9o + Kfl,+ ® (^3,0 + ^(Ts.o - Tis.o))}, (123) 

where 

i^FL,E = ci,,i?g+r^^ + ic^,,i?+g(r^4'^-rf''') + i?,C7z.,,r|'', (124) 

Kfl,, = Cl,E+Tf + lCL,,E^{Tl''^-rf'<^)+E,CL,grf, (125) 
i^FL,+ = E+sClA (126) 

A. 3 Charged-current scattering: ZEUS and HI 

For charged current deep-inelastic scattering, the reduced cross-section Eq. (|lip 

~cc,e± ^ l(Y+F^^^'^ T Y^xF^^''^ - y^Ff ^'^*) , (127) 

where the factor of a half comes from the average over the helicity of the incoming leptons. 

For charged current and neutrino scattering the inclusion of the third generation is 
more subtle than for neutral current scattering, since the scattering of involves a 
transition between b and t. In the ZM-VFNS this means that below the top threshold 
neither b nor t can contribute to the cross-section, even above b threshold [113]. We must 
thus consider the linear decomposition of the structure functions below and above the top 
threshold as two separate cases. Of course, all the current data are effectively below top 
threshold, so the second possibility is at present rather academic. 

Below top threshold (but above charm threshold) in the quark model 

^2^*^'^"^ = 2x{u + d + s + c}, F^'^'''~ =2x{u + d + s + c}. (128) 
F^'^'''^ = 2{-u + d + s-c}, F^'^'''' =2{u-d-s + c}. (129) 

Below charm threshold both the charm and the strange contributions would be absent. In 
terms of the PDF evolution eigenstates Eq. ()55ll57p we thus have 

Ff^'^^ = x{{l^ + lT,,)TiV, + l{Vs-V^,)}, (130) 
pCCe^ = {IV + lV24)T{T3 + l{Ts-Ti5). (131) 

The first term in Eq. (1130p is subtle: (|S + ^124) = + d'^ + + c'^ both above and 
below the b threshold, so the b quarks do not couple to the W^, consistent with Eq. (jl28p . 
Similar considerations apply to the first term in Eq. (jl3ip . In perturbative QCD we thus 
have 

Ff^''^* = x{C|_^0(|s + ir24)TC2,, 0(1^3 + ^(14 -145)) 

+rfC2,g^g}, (132) 

FfC'^* = ^{C£_^^(4s+lr24)TCi,,®(V3 + i(y8-^15)) 

+rfCL,9^g}, (133) 

Fg^^'^* = c|,,®(|^ + i^24) + C3,,®(r3 + i(r8-ri5)), (i34) 
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where the factor rj counts the proportion of active flavors: denoting with [n] the integer 
part of n, 

{1 when Uf is even 
1 — — when Uf is odd. 

The gluon coefficient functions are the same quantities which appear in Eqs. ()100lll5p . 
We can thus write the reduced cross sections as 

2FCC,e± ^ ix{i^cC,S ® So +KcC,9®ffO + i^CC,+ ® (73,0 + 1(18,0 -Ti5,o)) 

TKccv ^VoT Kcc- ® (V3,o + 5(V8,o - V^i5,o))}, (136) 

where in Mehin space 

+rf{Y+C2,g-y^CL,g)Tl'^, (137) 

+r^(y+C2,,-y2CL,3)rf, (138) 

Kcc,+ = >^-C3,gr+s, (139) 

Kccy = Y.Cl^{pl^ + \Tl%l (140) 

Kcc- = {y+C2s-y^CL,,)T-^. (141) 
Above top threshold, i^cc.s, ^cc,g and Kqq^ become instead 

+ lY^C,,,{Tf'^ -Tf% (142) 
i^^Cs = {Y+Cl^-y'CUn' + rf{Y+C2,g-y^CL,g)Tf 

+iy_C3,,(rf'^-rf'^), (i43) 

Khcy = Y.Cl^Tl^ + \{Y+C2,,-y^CL,,){Tl%-T^^^). (144) 
A. 4 Neutrino Scattering: CHORUS 

The cross-section for high energy neutrino scattering off an isoscalar nucleon is given by 
Eq. (|12|) . which we write as 

= k[Y+F!^^^'^ - y^Ff^ ± y_ xF^^% (145) 

where k = GIMn /{2t^{1 + Q^/M^f), Y+ = Y+- 2M^x^yyQ^. 

Remembering that the target is isoscalar, in the quark model below the top threshold, 
but above charm threshold, 

F^ = x{u + u + d + d + 2s + 2c}, F^ = x{u + u + d + d + 2s + 2c}, (146) 
= u-u + d- d + 2s-2c, F^ = u - u + d - d - 2s + 2c. (147) 

Below charm threshold both the charm and the strange contributions would be absent. 
Of course these expressions can be easily deduced from Eqs. (I128ll29p . since 

^l3 = ^f, ^f,3 = ^f- (148) 
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In terms of the PDF evolution eigenstates Eq. (j55ll57p we now have in perturbative QCD 

F2"(^) = x{Clg0{lJ: + lT24)TkC2,,^{Vs-Vi5) + rfC2,g0g}, (149) 
Ff^ = x{Ci,,® (|S + iT24) ^iCi,, 0(^8 -^15) + r/Ci,^ 05}, (150) 
^3^'^ = Clg0{IV + lV24)TlC3,g0iTs-Ti,), (151) 

the b threshold being accommodated in the same way as in Eq. ()132p . and with rj defined 
as in Eq. (I135p We can thus write the neutrino cross sections as 

a^(^) = Kx{i^^,E So + K^,g 0go + K^,+ ^(78,0 - ^15,0) 

^K,y ^VoT K,^- ^ ki^8,o - ^15,0)}, (152) 



where in Mellin space 



+rf (Y+C2,g - y'CL,g) (153) 

+rf(Y+C2,g-y^CL,g)Tf, (154) 

= y-C3,gr+s, (155) 

K.y = Y.CIjms + l^Ns), (156) 

= {Y+C2,g-y'CL,g)T^s- (157) 
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B Computation of PDF uncertainties 



Within our approach, the expectation value of any function .^[{(?}] which depends on the 
PDFs is computed as an average over the ensemble of PDFs, using the master formula 

^^-t k=l 

where A'set = -^rep is the number of sets of PDFs in the ensemble, equal to the number 
of replicas. The associated uncertainty is found as the standard deviation of the sample, 
according to the usual formula 



(7^r;^E(^H'''"}l-(^[Ml>) j ■ C^f) 



These formulae may also be used for the determination of central values and uncertainties 
of the parton distribution themselves, in which case the functional is identified with the 
parton distribution q : T[{q}] = q. 

A full discussion of statistical estimators for our PDF set, including correlators and 
covariances, is given in Appendix B of Ref. [44]. Here we briefly compare the procedures 
we use to determine central values and errors with those used for the various other PDFs 
available through HEPDATA. 

Available methods for the determination of PDF uncertainties fall broadly into two 
distinct categories, which we shall refer to as the HEPDATA method (used as a default in 
the PDF server at the HEPDATA database [19]) and the Monte Carlo method. In both 
methods sets of PDFs with uncertainties are given as an ensemble of A'set sets of PDFs, 

{(7^*^)} , A; = 0,...,A^set. (160) 

Conventionally the PDF set q^^^ corresponds to a "central" set. 

In the HEPDATA method, the central set is a best fit set of PDFs, which thus provides 
the central value for PDFs themselves. The central value of any quantity ./^[{(?}] is obtained 
in this method by evaluating it as a function of the central set: 

^(0) =^[{g(0)}]. (161) 

In the Monte Carlo method, the central values of any quantity ^[{g}] is instead given 
by Eq. ()158p . Therefore, the central value for PDFs themselves is given by 

fc=l 

This set is provided as set q^^^ in the NNPDFl.O PDFs. Hence, in the Monte Carlo method 
the central (best fit) PDF is obtained as an average of the replica best fits. However, for 
any quantity .^[{^}] which depends nonlinearly on the PDFs 

iniQ}]) ^ H{q^''>}]- (163) 



67 



Hence, Eq. (jl58p must be used for the determination of the central value, and use of the 
set g^''^ is not recommended. However, for a quantity that does depend linearly on the 
PDFs, such as a DIS structure function, Eq. p6ip with the central PDFs Eq. p62|) gives 
the same result as Eq. (jlSSp . and thus it may be used also with the Monte Carlo method. 
Note that set q^^^ should not be included when computing an average with Eq. (|158p . 
because it is itself already an average. 

The determination of uncertainties with the HEPDATA method is based on the idea 
that sets q^'^^ with A: > provide upper and lower variations (for even and odd values of k) 
away from the central set q^^^ which correspond to eigenvectors in parameter space. The 
one-o" uncertainty is then found by adding in quadrature these variations: 

4^^'^*^ = ^ E i^iW^''-'^}] - nu^'"^}]) 1 , (164) 

where the factor 

Cgo = \/2Erf-i[0.90] = 1.64485 (165) 

accounts for the fact that the upper and lower parton sets correspond to 90% confidence 
levels rather than to one-o" uncertainties. This method should be used with the CTEQ 
and MRST/MSTW sets Refs. [28-36]. 

A slightly different application of the HEPDATA method is required for the Alekhin 
PDF sets Ref. [38-40,94]. With these PDFs, sets g^'') with k> each provide the uncer- 
tainty limits from the central set, with upper and lower PDFs symmetrical by construction 
and already corresponding to one-o" uncertainties. So for these PDFs 
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/M \ 1/2 



vfc=l 



E ■ (166) 



In the Monte Carlo method, instead, the one-cr uncertainty is found using the variance 
formula Eq. (jl59p . This formula should be used not only for the NNPDFl.O sets, but also 
for the parton sets of Refs. [25,27,37]. 
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